Plasmid vectors for cellular slime moulds of the genus dictyostelium

ABSTRACT

The present invention relates generally to the fields of molecular biology and the production of recombinant protein using cellular slime moulds of the genus Dictyostelium. Most particularly, the present invention relates to novel strains of the genus Dictyostelium, recombinant plasmids for use with strains of the genus Dictyostelium, and polypeptides which facilitate the extrachromosomal replication of such plasmids in strains of the genus Dictyostelium. In particular, the present invention provides a polypeptide which facilitates the extrachromosomal replication of a recombinant plasmid in Dictyostelium spp in which the recombinant plasmid includes an origin of replication derived from a Ddp2-like plasmid but which lacks functional genes for extrachromosomal replication in wild type Dictyostelium spp. The extrachromosomal replicating plasmid constructed in accordance with the present invention are suitable for carrying a wide variety of genes and promoter sequences for control production of recombinant proteins by the biotechnology industry.

FIELD OF THE INVENTION

The present invention relates generally to the fields of molecular biology and the production of recombinant proteins by the biotechnology industry. More particularly, the present invention relates to novel strains of the genus Dictyostelium, recombinant plasmid vectors for use with strains of the genus Dictyostelium, and polypeptides which facilitate the extrachromosomal replication of such plasmids in strains of the genus Dictyostelium. Such extrachromosomally replicating plasmids, constructed with the art disclosed in this invention, are suitable for carrying a wide variety of genes and promoter sequences for the controlled production of recombinant proteins by the biotechnology industry.

BACKGROUND ART

As is well known in the art, genetic information is encoded on double stranded DNA molecules according to the sequence of four nucleotides containing different bases, adenine (A), thymine (T), cytosine (C) and guanine (G). Blocks of DNA sequences flanking genes often control gene activity by binding regulatory proteins and acting as recognition signals for enzymes of the cells biosynthetic machinery. Thus each cell contains a web of regulatory molecules which, by binding to specific DNA sequences, control gene activity. Other DNA sequences have crucial functions related to the control of DNA synthesis and partitioning of DNA into separate cells during cell division. These functions must be present on every DNA molecule in every cell or the DNA will be lost within a few cell generations.

Plasmids are usually circular DNA molecules possessing DNA sequences allowing them to replicate independently from chromosomal DNA. The DNA sequence block where the replication of plasmid DNA is initiated is commonly called the "origin of replication" and the ability to replicate independently from chromosomal DNA is referred to as "extrachromosomal" replication.

Molecular biologists have developed techniques for cutting DNA molecules into fragments using sequence specific restriction enzymes, purifying the fragments and rejoining them in a different order. If one of the fragments of DNA used contains an origin of replication from an E. coli plasmid, the DNA can be inserted (transformed) into E. coli where it will replicate as a plasmid and can be produced in relatively large quantities. These techniques mean that genes from one organism, for example a human gene, can be flanked by regulatory DNA sequences from another organism, for example the bacterium E. coli, causing the human gene to be active in E. coli under entirely different regulatory controls. If the plasmid in question is constructed to include a second origin of replication allowing replication in a separate host cell, for example a mouse cell line, the gene can easily be transferred to the second host cell. Such a plasmid containing origins of replication for more than one host is commonly called a "shuttle vector". Plasmids are usually constructed to contain selectable markers, which are usually genes that confer antibiotic resistance or a metabolic advantage on the host cell to allow cells containing the plasmid to be distinguished from cells that have not received any plasmid during the transformation. Selectable marker genes must be flanked by appropriate DNA sequences to permit gene activity in the required host cell. It is possible to insert a plasmid into a host cell where it will be unable to replicate and so the only cells that survive the selection procedure will be those with the plasmid inserted into the host's chromosomal DNA. Such a plasmid without an appropriate origin of replication is called an "integrating plasmid".

A cell produces polypeptides and proteins by initially making a messenger RNA copy of the gene, a process called transcription which is under the control of the flanking DNA sequences as summarised above. The cellular biosynthetic machinery then reads (translates) the RNA sequence in three nucleotide groups called codons which specify the amino acids to be incorporated into the polypeptide chain. The genetic code and mechanism of protein synthesis is very similar in all organisms so molecular biology techniques can be used to construct plasmid vectors to produce recombinant proteins in many different host cells irrespective of the source of the original gene. However, different host cells may process the protein in different ways so it may, for example, be folded incorrectly or cleaved by protease enzymes. Most importantly, eukaryotic cells differ from bacteria by frequently linking further chemical structures onto their proteins, a process called "post-translational modification". The chemical structures linked to eukaryotic proteins may include several types of oligosaccharide chains, glycolipids, lipids, sulphate and phosphate groups, all of which may affect the physical and biological properties of the molecule. Common effects of these post-translational modifications include increased resistance to proteolysis, altered immunogenicity, altered in vivo clearance and uptake by different cell types.

Post-translational modifications frequently occur on proteins that are secreted from cells or are present on cell membranes. Such proteins include a wide variety of soluble proteins that mediate inter-cellular interactions, blood proteins and cell surface receptors and so are of considerable interest to the pharmaceutical industry as either the targets for drug research or for in vivo administration as therapeutic drugs in their own right. Since post-translational modifications may substantially alter the biological activity of such proteins (for example, tissue plasminogen activator (Ezzell, 1988, Nature 333, 383)), it is a goal of the biotechnology industry to produce each protein with a range of different modifications, both those that occur naturally and new modifications such as truncated oligosaccharide chains. However, proteins with post-translational modifications can only be produced in eukaryotic hosts and only a few eukaryotes have been used industrially. Mammalian tissue culture, for example Chinese Hamster Ovary Cells, is usually able to produce proteins with post-translational modifications similar to the natural protein, but is very expensive since these cells frequently require serum components in their growth media, have a slow growth rate and are relatively difficult to grow in large fermentors. Consequently, simple eukaryotes such as insect cells infected with baculovirus or yeast cells have been used to produce proteins with some post-translational modifications at a considerably lower cost. However, no one host is suitable for all recombinant proteins or can produce more than a few of the wide range of desirable post-translational modifications.

Dictyostelium has some advantages as a host for the production of low cost recombinant proteins with post-translational modifications (reviewed by Glenn & Williams, 1988, Australian J. Biotech. 1(4), 46-56). These include the production of N-linked gycosylation indistinguishable from the mammalian "high mannose form" and a wide variety of other structures including phosphatidyl-inositol-glycan tails. It is possible to alter the post-translational modifications produced by Dictyostelium by either using a range of mutant cultures which produce altered glycan structures or by simply harvesting the Dictyostelium cells at different stages of the life cycle. A considerable body of scientific literature is available on the culture and genetics of Dictyostelium (Spudich J. Ed. (1987) Methods in Cell Biology Vol. 28, Academic Press, London). Dictyostelium has a number of characteristics suitable for use in the production of recombinant proteins in fermenters since they grow rapidly (4-10 hour cell cycle) and reach high densities (around 50 million cells per ml) in a nutrient medium. For some purposes, the ability of Dictyostelium to grow on a lawn of bacteria on a simple nutrient medium provides a remarkably simple and cheap culture technique when compared with mammalian or event insect tissue culture.

Dictyostelium strains are known to posses at least thirteen different plasmids (Farrat & Williams (1988) Trends in Genetics 4, 343-348), but only Ddp1, Ddp2 and pDG1 have been studied in detail. Plasmid pDG1 is very unstable when cloned in E. coli (Orii et al (1989) Nucleic Acids Research 17, 1395-1408) so most constructions of shuttle vectors have used sequences from either Ddp1 or Ddp2. Plasmid Ddp1 is 12.3 Kb in size, but Ahern et al (Nucleic Acids Research (1988) 16, 6825-6837) showed that a vector containing a selectable marker (G418) resistance and only 2.2 Kb of Ddp1 was able to replicate extrachromosomally in D. discoideum. However, but the copy number per cell of this truncated plasmids lowered from the 150 characteristic of the parent plasmid to only 10-15 copies per cell. It is probable that this low copy number plasmid may not segregate efficiently at cell division and so may be unstable in the absence of continuous selection with the antibiotic G418. Incorporation of additional Dictyostelium DNA into such plasmids based on the Ddp1 origin of replication prevents them being maintained extrachromosomally (Gurniak et al, (1990) Current Genetics 17, 321-325.) so they are unsuitable for use in the biotechnology industry.

The practical application of plasmids constructed from sections of Ddp2 has been limited by technical difficulties. The majority of techniques used in molecular biology are designed for use in the bacterium E. coli so the manipulation of Dictyostelium DNA requires it to be cloned into a vector capable of replication in E. coli. Consequently, research on Ddp2 has concentrated on the construction of recombinant "shuttle vectors" containing sequences allowing replication in both E. coli and Dictyostelium spp. Plasmid pMUW111 illustrates a shuttle vector that the present inventors have constructed (FIG. 4), which contains a 4.139 Kb Hind III--ScaI restriction fragment of Ddp2. This is close to the minimum amount of Ddp2 which can maintain extrachromosomal replication in wild type strains of Dictyostelium. Leiting and Noegel (1988 Plasmid 20, 241-248) have used a similar 4.0 Kb fragment of Ddp2 with approximately 300 bp deleted close to the Xho I restriction site to construct a 9.6 Kb shuttle vector called pnDe1. However, despite containing minimal sections for the extrachromosomal replication of Ddp2, both these shuttle vectors (pMUW111 and pn DE1) suffer from problems of instability when maintained in E. coli. This is consistent with the Ddp2 DNA containing sequences that are unstable in E. coli. This problem can be mitigated by the use of host strains which lack exo-nuclease I and have low plasmid copy number (e.g. strain CES 201), but such hosts frequently present problems in preparing sufficient plasmid DNA for gene cloning experiments and for transforming back into Dictyostelium.

The necessity of using pieces of Ddp2 DNA approximately 4 Kb long to construct shuttle vectors also raises problems with regard to the final size of the plasmid. The shuttle vector must contain selectable markers for both hosts together with appropriate promoter and termination sequences. These sequences comprise nearly 50% of the size of plasmids pMUW111 and pnDe1. In addition, to be of any practical use a shuttle vector must be capable of carrying additional DNA containing a gene to be expressed in Dictyostelium together with appropriate controlling sequences. These additional sequences are likely to amount to a minimum of at least 2 Kb of DNA, bringing the total plasmid size to around 12 kilobase pairs. Increasing the size of the plasmid to over 10 Kb decreases its stability, a factor of considerable importance for the commercial production of recombinant proteins where, in order to avoid contamination of the product, regulatory authorities do not permit the use the antibiotic selection to ensure plasmid maintenance while cells are grown for extended periods. A large plasmid also raises difficulties since fewer restriction enzymes will cut the plasmid at only one position, the most suitable sites for genetic manipulations.

Shuttle vectors capable of being easily manipulated in E. coli and transferred back into Dictyostelium spp. are an essential pre-requisite for realising the potential of Dictyostelium in biotechnology. The present inventors have discovered means by which such vectors containing sections of Ddp2 smaller than 4 Kb can be constructed.

The present inventors have elucidated the full nucleotide sequence of the plasmid Ddp2 and have determined that a portion of this sequence encodes a gene designated Rep. The present inventors have shown that the presence of a polypeptide encoded by the Rep gene is essential for extrachromosomal replication of the Ddp2 plasmid.

DISCLOSURE OF THE INVENTION

Accordingly, in a first aspect the present invention consists in a polypeptide which facilitates the extrachromosomal replication of a recombinant plasmid in Dictyostelium Spp, the recombinant plasmid including an origin of replication derived from a Ddp2-like plasmid, but lacking functional genes for extrachromosomal replication in wild type Dictyostelium Spp.

In a preferred embodiment of this aspect of the present invention the recombinant plasmid includes an origin of replication derived from plasmid Ddp2.

In a preferred embodiment of this aspect of the present invention the polypeptide has an amino acid sequence substantially as shown in FIG. 2 (SEQ ID NO: 3).

In a further preferred embodiment of this aspect of the present invention the polypeptide is encoded by a DNA sequence substantially as shown in FIG. 1 (SEQ ID NO: 2) from nucleotide 2378 to nucleotide 5038.

As used herein the phrase "Ddp2-like plasmid" is intended to cover plasmids having similar structure and similar functional regions to plasmid Ddp2. One example of such a Ddp2-like plasmid is plasmid pDG1.

In a second aspect the present invention consists in a recombinant plasmid vector, said vector being characterised in that it includes an origin of replication derived from plasmid Ddp2 or plasmid pDG1 and that it lacks functional genes for extrachromosomal replication in wild type Dictyostelium.

In a third aspect the present invention consists in a recombinant plasmid vector containing a DNA sequence substantially as shown in FIG. 1 (SEQ ID NO: 2) from nucleotide 1 to nucleotide 2436 or a subsection thereof, and lacking functional genes for extrachromosomal replication in wild type Dictyostelium spp.

In a fourth aspect the present invention consists in a recombinant plasmid vector containing a DNA sequence substantially as shown in FIG. 1 (SEQ ID NO: 2) from nucleotide 1153 to nucleotide 1775 or a subsection thereof, and lacking functional genes for extrachromosomal replication in wild type Dictyostelium spp.

In a fifth aspect the present invention consists in a recombinant plasmid vector containing the DNA sequence TGTCATGACA (SEQ ID NO: 1) but lacking functional genes for extrachromosomal replication in wild type Dictyostelium spp.

In a sixth aspect the present invention consists in a recombinant plasmid vector containing a DNA sequence substantially as shown in FIG. 1 (SEQ ID NO: 2) from nucleotide 1 to nucleotide 3241 or a portion thereof and lacking functional genes for extrachromosomal replication in wild type Dictyostelium spp.

It is presently preferred that the recombinant plasmid vector includes a heterologous DNA sequence(s) encoding a desired polypeptide, a promoter sequence(s) that controls the expression of the heterologous DNA sequence(s), and preferably a sequence(s) including a selectable marker.

In a preferred embodiment of the present invention the recombinant plasmid vector includes a DNA sequence encoding a polypeptide and regulatory sequences for secretion of the desired polypeptide.

In a further preferred embodiment of the present invention the recombinant plasmid vector includes an expression cassette comprising a promoter DNA sequence derived from the Dictyostelium Actin 15 gene, a DNA sequence encoding the secretion signal peptide sequence of the D19 gene which encodes the protein PsA and a DNA sequence for RNA polyadenylation signal derived from the Actin 15 gene.

In a further preferred embodiment of the present invention, the recombinant vector includes the sequence of plasmid pMUW102, plasmid pMUW130 or plasmid pMUW1530 and a heterologous DNA sequence encoding a desired polypeptide together with DNA sequences enabling the expression of the sequence encoding the desired polypeptide.

In a seventh aspect, the present invention consists in a recombinant strain of Dictyostelium, the recombinant strain being characterised in that the strain includes a gene encoding a polypeptide which facilitates the extrachromosomal replication of a recombinant plasmid, the recombinant plasmid including an origin or replication derived from plasmid Ddp2 but lacking the functional gene for extrachromosomal replication in wild type Dictyostelium.

In a preferred embodiment of the present invention the recombinant plasmid includes an origin of replication derived from plasmid Ddp2, and is more preferably the recombinant plasmid of one of the second to sixth aspects of the present invention.

The gene encoding the polypeptide which facilitates the extrachromosomal replication of the recombinant plasmid may be present in a chromosome of the recombinant strain of Dictyostelium or carried on a second plasmid, the second plasmid lacking an origin of replication derived from Ddp2. It is, however, presently preferred that the gene encoding the polypeptide is carried on a chromosome.

It is presently preferred that the recombinant strain of Dictyostelium has included within a chromosome the Rep. gene.

In a further preferred embodiment of the present invention the chromosome of the recombinant strain of Dictyostelium includes a sequence substantially as shown in FIG. 1 (SEQ ID NO: 2) from nucleotide 1885 to nucleotide 5292.

In a further preferred embodiment of the present invention the recombinant strain of Dictyostelium harbors a recombinant plasmid, the recombinant plasmid including an origin of replication derived from plasmid Ddp2 or plasmid pDG1, and preferably a DNA sequence encoding a desired polypeptide together with a DNA sequence enabling the expression of the sequence encoding the desired polypeptide, but lacking functional genes for extrachromosomal replication in wild type Dictyostelium.

In an eighth aspect the present invention consists in a method of producing a desired polypeptide comprising the following steps:

1. Transforming a recombinant strain of Dictyostelium with a recombinant plasmid vector including a DNA sequence encoding the desired polypeptide and sequences enabling the expression of the DNA sequence encoding the desired polypeptide;

2. Culturing the recombinant strain of Dictyostelium under conditions which allow the expression of the DNA sequence encoding the desired polypeptide and allowing the desired polypeptide to be produced either as a cell bound form or be secreted; and

3. Recovering the secreted desired polypeptide; characterised in that the recombinant plasmid vector includes an origin of replication derived from plasmid Ddp2 but lacks the functional genes for extrachromosomal replication in wild type Dictyostelium; and

that the recombinant strain of Dictyostelium includes a gene encoding a polypeptide which facilitates the extrachromosomal replication of the recombinant plasmid.

As used herein the phrase "cell bound form" is intended to cover proteins either internal to the cell or present on the cell membrane.

In a preferred embodiment of this aspect of the present invention the gene encoding the polypeptide which facilitates the extrachromosomal replication of the recombinant plasmid is present in a chromosome of the recombinant strain. Alternatively the gene is carried on a second recombinant plasmid present in the recombinant strain.

In a ninth aspect the present invention consists in a DNA molecule which includes a nucleotide sequence which encodes a polypeptide and which is capable of transforming Dictyostelium strains such that recombinant plasmid vectors which include an origin of replication derived from a Ddp2-like plasmid, preferably plasmid Ddp2, are incapable of extrachromosomal replication in wild type Dictyostelium spp. are capable of extrachromosomal replication in the transformed Dictyostelium strain.

In a preferred embodiment of this aspect of the present invention the DNA molecule includes a sequence substantially as shown in FIG. 1 (SEQ ID NO: 2) from nucleotide 2378 to nucleotide 5038, or part thereof.

As stated above, the present invention relates to the construction of extrachromosomal plasmid vectors for Dictyostelium using much smaller sections of the plasmid Ddp2 than has previously been possible. The present invention enables the construction of plasmid vectors containing an origin of replication derived from Ddp2 which can be encoded on a section of Ddp2 DNA of less than 3.0 Kb, but omit sections of Ddp2 DNA that contain genes for polypeptides essential for replication and preferably DNA sequences that are unstable when cloned in E. coli. The replication of such plasmids can be achieved by maintaining them in recombinant strains of Dictyostelium where the polypeptides required for plasmid replication are provided by genes inserted into the chromosomal DNA of the host cell or alternatively into another compatible plasmid vector. The present invention enables the production of a wide range of plasmid vectors which may be constructed using the techniques known in the art and disclosed herein, including plasmids designed for the expression of recombinant protein products in Dictyostelium spp.

The present invention further comprises the use of these recombinant Dictyostelium strains for the maintenance of recombinant plasmids containing an origin of replication derived from Ddp2 but lacking functional genes for replication proteins. The maintenance of recombinant plasmids in hosts that have been genetically modified to supply polypeptides necessary for plasmid replication is likely to be a crucial factor in the production of recombinant proteins using Dictyostelium spp.

SHORT DESCRIPTION OF THE DRAWINGS

In order that the nature of the present invention may be more clearly understood preferred forms thereof will now be described with reference to the following examples and accompanying figures, in which:

FIG. 1 (SEQ ID NO: 2) is the nucleotide sequence of the Dictyostelium plasmid Ddp2. The sequence of one strand of DNA is shown, numbered clockwise from the SalI restriction enzyme site. The position of the recognition sites of restriction enzymes SalI, HindIII, BglII, NdeI, ClaI, EcoRI, EcoRV, PstI, BclI, XbaI, XhoI, AccI, HindII and ScaI are indicated. START and STOP indicates the position of the first and last codons of the Rep gene respective.

    KEY: A=Adenine. C=Cytosine. G=Guanine. T=Thymine;

FIG. 2 is the amino acid sequence (SEQ ID NO: 3) of the polypeptide encoded by the Rep gene as derived from the DNA sequence of plasmid Ddp2. The nucleotide sequence of the coding strand of the Rep gene, numbered clockwise from the cleavage site of the SalI restriction enzyme, is aligned with the amino acid sequence predicted from the standard genetic code.

    ______________________________________                                                   KEY:                                                                           A = Adenine.                                                                   C = Cytosine.                                                                  G = Guanine.                                                                   T = Thymine.                                                                   a = Alanine.                                                                   c = Cysteine.                                                                  d = Aspartic acid.                                                             e = Glutamic acid.                                                             f = Phenylalanine.                                                             g = Glycine.                                                                   h = Histidine.                                                                 i = Isoleucine.                                                                k = Lysine.                                                                    l = Leucine.                                                                   m = Methionine.                                                                n = Asparagine.                                                                p = Proline.                                                                   q = Asparagine.                                                                r = Arginine.                                                                  s = Serine.                                                                    t = Threonine.                                                                 v = Valine.                                                                    w = Tryptophan;                                                      ______________________________________                                    

FIG. 3 is a schematic representation of the major structural features of Ddp2 aligned with a map of the cleavage sites of some restriction enzymes;

FIG. 4 is a schematic representation of the construction of plasmid pMUW111;

FIG. 5 is a schematic representation of the construction of plasmid pMUW110;

FIG. 6 is a schematic representation of the construction of plasmid pMUW102;

FIG. 7 is a schematic representation of the construction of plasmid pMUW130;

FIG. 8 is a schematic representation which summarizes the Ddp2 sequences used to construct plasmids pMUW111, pMUW102, pMUW110 and pMUW130;

FIG. 9 is a schematic representation of the construction of the shuttle vectors pMUW1530 and pMUW1580;

FIG. 10 is the nucleotide sequence (SEQ ID NO: 5) of the shuttle vector pMUW1530. The sequence of one strand of DNA is shown, numbered anti-clockwise from the ClaI restriction enzyme site. The position of the recognition sites of restriction enzymes ClaI, ScaI, BamHI, BglII and NdeI are indicated.

    KEY: A=Adenine. C=Cytosine. G=Guanine. T=Thymine;

FIG. 11 is a schematic representation of the construction of the promoter and secretion signal sequence sections of an expression cassette in plasmid pMUW1594;

FIG. 12 is a schematic representation of the cloning of the polyadenylation sequence from the Dictyostelium Actin 15 gene into plasmid pMUW1560;

FIG. 13 is a schematic representation of the construction of the expression cassette in pMUW1621;

FIG. 14 is a schematic representation of the construction of an expression vectors pMUW1630 and pMUW1633 by insertion of the expression cassette into the shuttle vector pMUW1580; and

FIG. 15 is the nucleotide sequence (SEQ ID NO: 4) of the expression vector pMUW1630. The sequence of one strand of DNA is shown, numbered anti-clockwise from the ClaI restriction enzyme site. The position of the recognition sites of restriction enzymes ClaI, ScaI, NsiI, HindIII, SmaI and KpnI are indicated. START indicates the position of the first codon of secretion signal peptide in the expression cassette.

    KEY: A=Adenine. C=Cytosine. G=Guanine. T=Thymine.

BEST MODE OF CARRYING OUT THE INVENTION

The present inventors have established for the first time the full nucleotide sequence of the Dictyostelium plasmid Ddp2 as shown in FIG. 1 (SEQ ID NO: 2). The nucleotide sequence has been numbered clockwise around the circular DNA molecule starting at the single cut site of the SalI restriction enzyme. Detailed examination of the DNA sequence of Ddp2 has allowed different functional regions of the plasmid to be distinguished, as shown in FIG. 3, and regions likely to be unstable when cloned in E. coli. The elucidation of these different functional regions has allowed the present inventors to overcome a number of the technical problems that have hitherto limited the use of extrachromosomal vectors in Dictyostelium.

The DNA sequence of Ddp2 between nucleotide 2378 and 5038 encodes a gene referred to herein as Rep. This section of Ddp2 contains a large "open reading frame" where one of the six possible ways to read the triple nucleotide genetic code (known as codons) has a long region without any of the codons that act as stop signals for protein translation. Such an "open reading frame" considered along with flanking sequences that are similar to the promoter and poly-adenylation signals of previously described Dictyostelium genes (Kimmel & Firtel, 1982 In The Development of Dictyostelium discoideum, Academic Press, New York, pp234-324) is strong evidence that the Rep gene could be transcribed into RNA and translated into a polypeptide containing 887 amino acids with the sequence (SEQ ID NO: 3) shown in FIG. 2. Evidence supporting the view that the Rep gene is translated into a polypeptide comes from the inability of plasmids constructed with interruptions to the Rep gene, for example pMUW102, to replicate in wild type strains of Dictyostelium discoideum. The RNA and polypeptide product of the Rep gene has not yet been detected and it is believed to be produced in only low amounts to positively regulate the initiation of plasmid replication by the host enzymes that normally replicate chromosomal DNA. However, it should be appreciated that either the messenger RNA or the translated polypeptide derived from the Rep gene could be processed by the cellular biochemical machinery to produce one or more shorter polypeptides. It is also likely that the polypeptide also contains regions that act as negative regulators of plasmid copy number. None of these areas of uncertainty subtract from the basic discovery that at least part of the open reading frame encodes a polypeptide that is essential for the replication of Ddp2. This finding explains the previously established need for shuttle vectors to contain a large section of Ddp2 DNA since such vectors would need to contain both the origin of replication and an additional 2.66 kilobase pair Rep gene plus flanking control sequences.

Plasmid vectors based on Ddp2 need to contain DNA from the section of Ddp2 between the HindIII restriction enzyme site at 1153 base pairs and the BgIII restriction enzyme site at 1885 base pairs.

This is demonstrated by the inability of plasmids that lack this section of DNA, for example pMUW110 (FIG. 5), to replicate in wild type strains of Dictyostelium. Plasmid pMUW110 contains the complete Rep gene plus flanking sequences including the polyadenylation sequences and 483 nucleotides encompassing the promoter region. Thus pMUW110 contains the sequences required to produce the polypeptide required for replication, but lacks a functional origin of replication. Consequently, a Ddp2 origin of DNA replication or associated control sequences must lie before the BgIII restriction enzyme site at 1885 base pairs. This region of Ddp2 is present in plasmid pMUW102 which contains the section of Ddp2 between the HindIII restriction enzyme site at 1153 base pairs and the XhoI restriction enzyme site at 3242 base pairs using plasmid pMUW102 (FIG. 6), but plasmid pMUW102 lacks a functional Rep gene and so is unable to replicate in wild type strains of Dictyostelium. The presence of a functional origin of replication in plasmid pMUW102 is demonstrated by transforming it into Dictyostelium strains along with plasmid pMUW110 to provide the essential replication polypeptide from the Ddp2 Rep gene. The present inventors experimental results clearly show that plasmid pMUW110 is inserted into the chromosomal DNA to form a stable recombinant strain of Dictyostelium and, in the same cells, plasmid pMUW102 is stably maintained as an extrachromosomal plasmid. This demonstration of an extrachromosomal plasmid containing an origin of replication from plasmid Ddp2 and its maintenance in a Dictyostelium strain by virtue of chromosomal DNA containing the Rep gene encoding polypeptides essential for plasmid replication represents a significant technical advance. It is apparent to one skilled in the art that similar techniques can be utilised for the construction of a diverse range of plasmid vectors for Dictyostelium.

It is relevant to briefly examine the mechanism for selecting cells that were successfully transformed with both pMUW102 and pMUW110. Both these vectors contain a selectable marker conferring resistance to the antibiotic G418, but other genes could be used to serve the same function. In fact the present inventors have developed another resistance gene bleomycin for use as a selectable marker in Dictyostelium. The G418 resistance gene is under the control of Dictyostelium actin 6 promoter and the actin 8 3' poly-adenylation signals to ensure that it is expressed in Dictyostelium cells to provide a method of selecting the few cells that take up the plasmid DNA. Plasmid pMUW110 which lacks an origin of replication can only be retained in those few cells where the plasmid becomes integrated into the chromosomal DNA. Any cells that are transformed with only plasmid pMUW102 can only be resistant to G418 if the plasmid becomes integrated into the chromosomal DNA since this plasmid cannot replicate without the polypeptide produced by the Rep gene. However, some of the cells that receive both plasmids can have the plasmid pMUW110 integrated into the chromosomal DNA in a manner that preserves the function of the Rep gene and so will be able to maintain multiple extrachromosomal copies of the plasmid pMUW102. Once the cells transformed with both plasmids pMUW102 and pMUW110 have been selected by resistance to G418 they may be stably maintained in the absence of the antibiotic.

Plasmid pMUW102 contains 2089 base pairs of Ddp2; a considerably smaller section of Ddp2 than previously known to be capable of extrachromosomal replication. This sequence has been substantially shortened by removing more of the Ddp2 DNA sequences that are not essential for the replication of plasmid pMUW102 in recombinant strains of Dictyostelium. The results with plasmid pMUW130 confirms that all the DNA sequences necessary for stable extrachromosomal replication at high copy number are contained in a 622 base pair HindIII-ClaI fragment of Ddp2. In the light of present knowledge as disclosed herein, it is also relatively simple to ascertain the essential sequences within the section of Ddp2 between the HindIII restriction enzyme site at 1153 base pairs and the ClaI restriction enzyme site at 1885 base pairs using standard molecular biology techniques such as deletions and insertions. Experiments to determine the minimum section of Ddp2 DNA sequence necessary for plasmid vector construction have been carried out. Several copies of a TGTCATGACA (SEQ ID NO: 1) sequence are essential for the function of the Ddp2 origin of replication.

The use of smaller sections of Ddp2 for vector construction than previously possible allows the omission of some of the sequences likely to be responsible for plasmid instability in E. coli. Plasmid pMUW130 contains only one copy of sequences in the 501 base pair inverted repeat of Ddp2 and does not contain the long stretches of poly-adenine or poly-thymidine found between the end of the open reading frame and the SalI restriction enzyme site. Such inverted repeats and poly-adenine or poly-thymidine sequences are known to be unstable in E. coli. Plasmid pMUW130 also omits the (GATGAA)11 (SEQ ID NO: 19) repeat found at the end of the Rep gene and which is also likely to be unstable in E. coli. Therefore, it appears that the smaller sections of Ddp2 used to construct plasmid vectors according to this invention have less of the problems of stability in E. coli than were previously encountered using larger segments of Ddp2 DNA.

The integrating plasmid pMUW110 contains all the information necessary for the controlled expression of the Ddp2 Rep gene required to maintain the copy number of plasmid pMUW102. This control of plasmid copy number could not be predicted since there would be no direct linkage between the number of copies of the plasmid and the Rep gene as in the original plasmid. It is thought that this copy number control is probably achieved by an auto-regulatory mechanism where the product of the Rep gene represses further transcription from the Rep gene and so maintains a constant cellular concentration of the polypeptide that regulates plasmid replication. The localisation of the promoter sequences to the section of Ddp2 DNA between the BgIII restriction enzyme site and the start of the Rep gene, as disclosed herein, allows future experiments to determine the regulatory mechanisms governing the transcription of the open reading frame and control of plasmid copy number. It is anticipated that this approach will lead to experimental control of plasmid replication and copy number by suitable modification or duplication of the control sequences.

In the experiments described herein, the plasmid pMUW110 has been stably integrated into the Dictyostelium chromosomal DNA using the same selective marker, G418 resistance, as present on the extrachromosomal plasmid pMUW102. However, there would be advantages in using a different selective marker on the integrating vector from that used for the extrachromosomal plasmid. The present inventors have developed a thymidylate synthase gene as a second marker for selection in a Dictyostelium discoideum strain that is unable to synthesise thymidine (Chang et al, 1989, Nucleic Acids Research 17, 3655-3661). The thymidylate synthase selection has the advantage for biotechnological uses in that the selection is maintained in the absence of any antibiotic. Clearly any combination of selectable markers can be used on the integrating or extrachromosomal vectors, but the preferred combination is to have the thymidylate synthase marker on the extrachromosomal plasmid and maintain it in the enzyme deficient Dictyostelium strain. This means that, without using any antibiotic selection, any host cell losing either the extrachromosomal plasmid or the functional integrated vector would be unable to grow since any cell losing the production of the polypeptide necessary for plasmid replication would also lose the functions encoded on the extrachromosomal plasmid.

Examples of the application of the invention have been demonstrated by the construction of a range of shuttle vectors and the production of a recombinant protein in Dictyostelium discoideum. The novel shuttle vectors pMUW1530, pMUW1570 and pMUW1580 incorporate the Ddp2 origin of replication on the 600 bp XbaI--ClaI fragment (1175-1775 bp) of Ddp2 into a small E. coli plasmid (pMUW1510) that contains close to the minimal amount of sequence from pBR322 required for replication in E. coli in order to reduce the potential for these sequences to adversely effect the function of the shuttle vector in D. discoideum. Other useful features of these shuttle vectors is that they contain very few sites for six base restriction enzymes, apart from single BamHI and ClaI sites in appropriate positions for the insertion of additional DNA without disrupting essential functions. Sequences that might be inserted into such sites include genes for the production of recombinant proteins or selective markers, promoter sequences to control gene function and signal sequences for the correct processing of messenger RNA molecules and the translated proteins. This is illustrated by the construction of a novel "expression cassette" suitable for the production and secretion of a recombinant proteins from Dictyostelium cells. This expression cassette contains the promoter from the D. discoideum actin 15 gene, a section of the D19 gene encoding a secretion signal peptide, the polylinker from the E. coli plasmid pGEM3Z (for insertion of genes for expression) and lastly the polyadenylation signal from the D. discoideum actin 15 gene. However, it will be apparent to one skilled in the art that a wide range of similar constructs could be made for this purpose using DNA sequences from other genes or even completely synthetic sequences serving the same functions.

The applications of the shuttle vector based on the technology disclosed in this document was demonstrated by the production of a recombinant protein from an E. coli gene for enzyme B-glucuronidase from D. discoideum cells containing an expression vector constructed by inserting the expression cassette into the shuttle vector pMUW1580.

Plasmid Ddp2 is believed to be the first functionally characterized member of a new group of structurally and functionally similar plasmids. This new group of plasmids can be defined as all encoding a single polypeptide of 700-1000 amino acids which is essential for plasmid replication and which has sequence homologies with the Ddp2 Rep gene, indicating a common evolutionary origin. Further, the origin of replication of these plasmids is associated with one arm of an inverted repeat sequence that is distinct from the Rep gene. The inventors confidently predict that the techniques they have disclosed in this application can be used to construct further extrachromosomal plasmid vectors for use in the biotechnology industry starting from the functionally analogous regions of any of this broader group of "Ddp2-like" plasmids.

The only other member of this "Ddp2-like" group of plasmids to have been sequenced to date is plasmid pDG1 isolated from a unidentified Dictyostelium species (Orii et al (1987) Nucleic Acids Res. 15, 1097-1107). Plasmid pDG1 has a very similar structure to Ddp2, possessing similar sized inverted repeats and a single open reading frame analogous to the Rep gene of Ddp2. Despite plasmid pDG1 having been fully sequenced, nothing is known regarding the functions of these features or the location of the origin of replication (Orii et al (1989) Nucleic Acids Res. 17, 1395-1408). The only recombinant shuttle vector produced with pDG1 sequences incorporated the long, 4.2 Kb ClaI fragment of pDG1, i.e., omitting only 0.2 Kb from the whole plasmid (Orii et al (1989) Nucleic Acids Res. 17, 1395-1408). Such pDG1 based plasmids are very unstable in E. coli (Saing et al (1988) Mol. Gen. Genet. 214, 1-5) and so are unsuitable for use in the production of recombinant proteins.

The plasmid pDG1 is recognized as a member of the "Ddp2-like" group of plasmids by virtue of its having a similar structure and having sequence homologies with Ddp2 in the region of the open reading frame at both the DNA and amino acid levels. The non-coding regions of these two plasmids have little sequence homology, apparently being free to diverge in the course of evolution. The presence of large inverted repeats in both pDG1 and Ddp2 is probably not a key feature of the group of "Ddp2-like" plasmids as only one copy is essential for the replication of Ddp2.

In the light of the functional data from the analogous regions of Ddp2, as disclosed in this application it is possible to re-evaluate the pDG1 sequence data and predict that pDG1 origin of replication lies outside the operating reading frame and overlaps with one of the inverted repeats. In addition, the speculation (Ori et al (1989) Nucleic Acids Res. 17, 1395-1408) concerning the weak homologies of the Rep gene with reverse transcriptase is unlikely to be correct as the homology is not conserved in Ddp2. The Rep gene of Ddp2 can be aligned with the open reading frame of pDG1 with 35% of amino acids in identical positions indicating considerable evolutionary homologies. The proteins encoded by the two plasmids also have similar structures, being comprised of two similar sized domains separated by a threonine rich sequence and the carboxy terminus of both proteins being a highly acidic glutamic and aspattic acid rich sequence. To one skilled in the art, the similarities between the proteins produced by these two plasmids indicates they have very similar functions and also indicates regions of high sequence homology which are most likely to have roles crucial for the proteins function. Whilst it is unlikely that the protein from pDG1 would be sufficient to cause replication of the Ddp2 origin of replication (and vice versa) because the sequence recognized by the protein is likely to be specific to the individual origin of replication, it is very likely that novel proteins constructed from sections of both proteins would function correctly. For example, the replacement of the acidic carboxy terminus of the Ddp2 Rep protein with the carboxy terminus of the pDG1 protein should not affect the ability of the molecule to allow replication from the Ddp2 origin of replication. Furthermore, it should be possible to change the specificity of the Ddp2 Rep gene simply by replacing the section of the protein that recognizes the Ddp2 origin of replication by a section recognizing an origin of replication from another member of the "Ddp2-like" group of plasmids. Clearly, the basic technology disclosed in this application, whereby, the replication protein and the origin of replication are separated onto separate vectors, is capable of a wide range of different applications for the construction of plasmid vectors incorporating sections from the broad group of "Ddp2-like" plasmids.

EXAMPLE 1 Sequencing of plasmid Ddp2

Our laboratory at Macquarie University sequenced Ddp2 by cutting Ddp2 DNA into many small fragments and cloning them separately into a commercially available plasmid called pGEM3Z (Promega Corporation, Madison, USA). In this vector, small sections of Ddp2 DNA were stable and could be sequenced using a technique called "double stranded sequencing" where a small oligonucleotide is used to prime the synthesis of a new radio-labelled DNA strand on a template of denatured plasmid DNA. The oligonucleotide primer can be the complementary sequence to the SP6 or T7 regions flanking the cloning site or it can be a custom synthesised oligonucleotide with a sequence that matches part of the cloned Ddp2 DNA.

Ddp2 DNA was digested with the restriction enzymes ClaI, Sau3A, AluI or RsaI and cloned into the plasmid pGEM3Z at the AccI, BamHI or SmaI restriction enzyme sites using standard molecular biology techniques, and transformed into the E. coli strain JM109. Clones containing Ddp2 DNA were selected at random and stored in broth containing 15% glycerol and stored at -80 degrees.

Plasmid DNA from the clones was prepared using alkaline lysis and a RNAse enzyme treatment as recommended by the Promega literature on pGEM3Z. Before use in the sequencing reaction, 4 ug of each plasmid was alkaline denatured with a brief treatment with 0.4M sodium hydroxide, precipitated with ethanol and annealed with 10 picomoles of oligonucleotide primer according to the procedure recommended by Pharmacia LKB Biotechnology (Uppsala, Sweden) for their T7 DNA polymerase sequencing kit which was used for the sequencing reaction. The sequencing reaction used ATP radio-labelled with ³⁵ S. The radio-labelled DNA was separated on 6% acrylamide/SM urea gels which were then fixed in 10% methanol plus 10% acetic acid, dried and autoradiographed. The sequence revealed by the autoradiography films were entered into a computer and then overlapping sequences matched automatically and compiled into the complete DNA sequence of Ddp2.

The full sequence of Ddp2 is available from the EMBL data base, accession number X51478.

EXAMPLE 2 Location of the Origin of Replication of Ddp2

In further experiments the Ddp2 origin of replication was located to within the HindIII--Clar fragment (1153-1775 bp) of Ddp2 as in plasmid pMUW130.

pMUW111

The plasmid pMUW111 was constructed by inserting the 4.1 Kb HindIII to ScaI fragment of Ddp2 into the SalI site of BIOSX. BIOSX is an integrating D.discoideum/E. coli shuttle vector constructed by Nellen et al. (Gene. 39 (1985) 155-163) and contains the Ampicillin and Kanamycin/G418 antibiotic resistance genes.

Ddp2 plasmid was first digested with restriction enzymes HindIII and ScaI. After the digestion was completed, the Hind III 5' overhang ends were made blunt using an end-filling reaction involving the enzyme DNA polymerase I "Klenow fragment". After this reaction was completed, it was fractionated in a 0.8% TBE agarose gel. The 4.1 Kb fragment was then excised from the gel and purified using a commercial kit, "Gene-Clean" (BIO101,Inc., USA). The purified DNA was then ligated with BIOSX that had been digested with SalI and end-filled. After ligation, the mixture was transformed into E. coli strain CES201 (Leach, D. R. F. and Stahl, F. W. (1983). Nature 305, 448-451). CES201 was made competent for transformation using the procedure as published by Hanahan, D. (J. Mol. Biol. (1983) 166, 557-580). The transformation mixture was then plated onto Luria-agar containing 50 ug/ml ampicillin. E. coli ampicillin resistance transformants containing pMUW111 were confirmed by restriction fragment mapping of isolated plasmids and also by radioactive hybridization using Ddp2 as a probe.

10 ug of pMUW111 was then used to transform Dictyostelium axenic strain, AX3K, using the standard calcium phosphate precipitation procedure developed by Nellen W. et al. (Mol. Cell. Biol. (1984) 4, 2890-2898) with G418 selection. To determine if pMUW111 was capable of autonomous replication, total nuclear DNA was isolated from G418 resistant transformants and then screened on a "lysis in the gel" as described by Noegel A. et al (J. Mol. Biol. (1985) 185, 447-450). The gel was then southern-transferred onto zeta-probe blotting membrane (Bio-RAD) and hybridized using ³² P-labelled Ddp2 DNA. Autoradiography showed that pMUW111 had a higher mobility than the bulk chromosomal DNA, indicating it existed as an autonomously replicating plasmid.

pMUW102

The plasmid pMUW102 was constructed by inserting the 3.2 Kb SalI to XhoI fragment of Ddp2 into the Sal I site of BIOSX. This fragment contained only part of the open reading frame. Hence a complete functional protein(s) would not be expected to be produced by this construct.

Ddp2 plasmid was first digested with restriction enzymes SalI and XhoI. The sample was then fractionated in a 0.8% TBE agarose gel. The 3.2 Kb fragment was then excised from the gel and purified using a commercial kit, "Gene-Clean". The purified DNA was then ligated with BIOSX that had been digested with SalI. After ligation, the mixture was transformed into competent E. coli strain CES201. The transformation mixture was then plated onto Luria-agar containing 50 ug/ml ampicillin. E. coli ampicillin resistant transformants containing pMUW102 were confirmed by restriction fragment mapping of isolated plasmids and also by radioactive hybridization using Ddp2 as a probe.

10 ug of pMUW102 was then used to transform D. discoideum axenic strain, AX3K, using standard calcium phosphate precipitation procedure with G418 selection. To determine the fate of pMUW102, total nuclear DNA was isolated from G418 resistant transformants and then screened on a "lysis in the gel". The gel was then southern-blotted onto Zeta-probe blotting membrane and hybridized using ³² P-labelled Ddp2 DNA. Autoradiography showed that pMUW102 had the same mobility as the bulk chromosomal DNA, indicating it had integrated into chromosomal DNA and it was not capable of existing as a free plasmid. This experiment demonstrated that an intact open reading frame is essential for existence as an autonomously replicating plasmid.

pMUW110

The plasmid pMUW110 was constructed by inserting the 3.4 Kb BglII to ScaI fragment of Ddp2 into the Sal I site of BIOSX. This fragment contained the whole open reading frame "Rep gene" and the 5' and 3' flanking sequences that control the production of protein(s) specified by the open reading frame.

Ddp2 plasmid was first digested with restriction enzymes ScaI and BglII. After the digestion was completed, the BglII 5' overhang ends were made blunt using an end-filling reaction involving the enzyme DNA polymerase I "Klenow fragment". After this reaction was completed, the sample was fractionated in a 0.8% TBE agarose gel. The 3.4 Kb fragment was then excised from the gel and purified using a commercial kit, "Gene-Clean". The purified DNA was then ligated with BIOSX that had been digested with SalI and end-filled.

After ligation, the mixture was transformed into E. coli strain CES201 that had been made competent for transformation. The transformation mixture was then plated onto Luria-agar containing 50 ug/ml ampicillin. E. coli ampicillin resistant transformants containing pMUW110 were confirmed by restriction fragment mapping of isolated plasmids and also by radioactive hybridization using Ddp2 as a probe.

10 ug of pMUW110 was then used to transform D. discoideum axenic strain, AX3K, using standard calcium phosphate precipitation procedure with G418 selection. To determine the fate of pMUW110, total nuclear DNA was isolated from G418 resistant transformants and then screened on a "lysis in the gel". The gel was then southern-transferred onto zeta-probe blotting membrane and hybridized using ³² P-labelled Ddp2 DNA. Autoradiography showed that pMUW110 had the same mobility as the bulk chromosomal DNA, indicating it had integrated into the chromosomal DNA and it was not capable of existing as a free plasmid.

The difference between pMUW111 and pMUW110 is that 732 nucleotides between the HindIII restriction enzyme site at 1153 base pairs and the BglII restriction enzyme site at 1885 base pairs is missing in pMUW110. Hence the inability of pMUW110 to exist as a plasmid in AX3K could be explained by one of the following:

i) The 732 bp sequence contained part of the origin of replication (ORI) of the plasmid Ddp2.

ii) The 732 bp sequence contained cis acting element(s) that control the production of protein(s) specified by the open reading frame.

The first explanation was found to be correct by a subsequent experiment involving the co-transformation of AX3K with both pMUW102 and pMUW110. Screening of the G418-resistant transformants revealed that pMUW102 had a higher mobility than the bulk chromosomal DNA. This proved that pMUW102 could exist as an extrachromosomal plasmid only in the presence of pMUW110, which contained the intact open reading frame and hence is capable of providing the transacting protein(s) required for pMUW102 to replicate as a plasmid.

pMUW130

The plasmid pMUW130 was constructed by inserting the 622 base pair HindIII to ClaI fragment from Ddp2 (i.e. 1153 base pair to 1775 base pair) into the commercial E. coli plasmid pGEM3Z (Promega Corporation, Madison, USA) which had been digested with AccI and HindIII restriction enzymes. The construction of the plasmid used the same procedure as that of pMUW102 (above) except that the E. coli strain used was HB101.

Plasmid pMUW130 contains most of the 732 base pairs sequence that are in plasmid pMUW102, but not in plasmid pMUW110 and which was thought to be required for extrachromosomal replication. An experiment where pMUW102 and pMUW110 were co-transformed into D. discoideum strain AX3K demonstrated that pMUW130 can replicate extrachromosomally in the presence of pMUW110 which has been integrated into the chromosomal DNA. This confirms that an origin of DNA replication is located on this small HindIII--ClaI fragment of Ddp2 DNA. At approximately 3.3 kilobase pairs of DNA, pMUW130 was substantially smaller than previous shuttle vectors that had been constructed for Dictyostelium spp.

The location of an origin of replication on the HindIII--ClaI fragment incorporated into plasmid pMUW130 raises interesting scientific questions as to whether the similar sequences that occur in the small HindIII fragment (66-1153 bp) are also capable of acting as an origin of replication. This was investigated by cloning the small HindIII fragment (66-1153 bp) into the Hind III site of plasmid B10SX to form plasmid pMUW105. However, plasmid pMUW105 was unable to replicate extrachromosomally when mixed with plasmid pMUW110 (to provide the Rep gene) and transformed into D. discoideum strain AX3K. The small HindIII fragment in pMUW105 contains an entire, near perfect copy of the 501 bp inverted repeat sequence that forms most of the Ddp2 origin of replication in plasmid pMUW130. So the failure of pMUW105 to replicate extrachromosomally demonstrates that either the sequences just outside the 501 bp inverted repeat are essential for replication or the 11 nucleotide substitutions between the two copies of the 501 bp inverted repeat have prevented the copy in the small HindIII fragment in pMUW105 from acting as the origin of replication. Both of these possibilities result in the absence of or changes to copies of the DNA sequence TTTTTTGTCATGACACTTTTTTTTTTTTGTCATGACA (SEQ ID NO: 6), one copy of which lies just outside the 501 bp inverted repeat in pMUW130 and while a second copy of which is altered in pMUW105. This sequence contains two copies of a 10 bp palindrome TGTCATGACA (SEQ ID NO: 1) (i.e. the two halves are symmetrical, so the complementary DNA strand will have the same sequence in the opposite orientation). Such palindromic sequences are typical of many sites recognized by DNA binding proteins, which would be consistent with this sequence being important for regulation of the origin of replication.

The Ddp2 origin of replication in plasmid pMUW130 contains two copies of the above oligo T sequence, each of which contains two palindromes. Deletion of one copy of the sequence by cutting out the HindIII--BglII restriction fragment (1153-1369 bp, numbered according to Ddp2) of plasmid pMUW130 produced plasmid pMUW138 which is unable to replicate extrachromosomally in D. discoideum, thus demonstrating the importance of this sequence for the function of the origin of replication. However, it is unlikely that this sequence is the actual origin of replication, which is believed to lie in flanking sequences.

EXAMPLE 3 Construction of a Small Shuttle Vector

A list of oligonucleotide sequences used in vector constructions is shown in Table I.

Despite plasmid pMUW130 being a great improvement on all shuttle vectors previously available for D. discoideum, it has some drawbacks for use in the biotechnology industry. Plasmid pMUW130 contains a disrupted polylinker (concentrated region of restriction enzyme sites) and DNA sequences derived from the Lac operon and the parent pBR322 plasmid which are not required in a Dictyostelium vector.

Ideally, the restriction enzyme sites in an expression plasmid should be only in positions convenient for the manipulation of the gene to be expressed and the amount of unnecessary DNA should be minimized. Shuttle plasmid pMUW 1530 was designed specifically for the purpose of easy manipulation of inserted sequences. This plasmid contains the minimal sequences derived from pBR322 that allow replication in E. coli plus the ampicillin resistance selective marker. The "poison sequences" that are known to interfere with replication from the SV40 origin of replication (Lusky & Botchan (1981) Nature 293, 79-81.) and gene expression in mammalian cells (Peterson et al (1987) Mol. Cell. Biol. 7, 1563-1567) were excluded, although as yet their influence on D. discoideum plasmids is unknown. Other features of the plasmid include the creation of two unique six base restriction sites (BamHI and ClaI) positions suitable for the insertion of expression cassettes or selective markers.

                                      TABLE 1                                      __________________________________________________________________________     LIST OF OLIGONUCLEOTIDE SEQUENCES USED IN                                      VECTOR CONSTRUCTION.                                                           __________________________________________________________________________     The sequence (5' to 3') of the oligonucleotides synthesised at Macquarie       University is shown                                                            together with the approximate position of restriction enzyme cutting           sites.                                                                         PCR primers for cloning the actin15 promotor                                   GA190                                                                              (SEQ ID NO: 10).                                                                          ##STR1##                                                        GA188                                                                              (SEQ ID NO: 11).                                                                          ##STR2##                                                        PCR primers for cloning the actin15 3' region                                  GA189                                                                              (SEQ ID NO: 16).                                                                          ##STR3##                                                        GA186                                                                              (SEQ ID NO: 17).                                                                          ##STR4##                                                        PCR primers for cloning the secretion signal from the D19 gene                 GA187                                                                              (SEQ ID NO: 12).                                                                          ##STR5##                                                        GA182                                                                              (SEQ ID NO: 13).                                                                          ##STR6##                                                        Linker inserted into NdeI site to complete secretion signal                    sequence                                                                       GA297                                                                              (SEQ ID NO: 14).                                                                          ##STR7##                                                        GA296                                                                              (SEQ ID NO: 15).                                                                          ##STR8##                                                        PCR primers used to clone pGEM3Z origin of replication                         GA181                                                                              (SEQ ID NO: 8).                                                                           ##STR9##                                                        GA179                                                                              (SEQ ID NO: 7).                                                                           ##STR10##                                                       Sequencing oligonucleotide for pMUW1410                                        GA220                                                                              (SEQ ID NO: 9).                                                                          GAAGCATTTATCAGGG                                                 Linker used to clone the gene for B-glucuronidase                              GA310                                                                              (SEQ ID NO: 18).                                                                          ##STR11##                                                       __________________________________________________________________________

pMUW1410

Plasmid pMUW1410 is an E. coli plasmid which was made to be the basis for construction of a series of shuttle vectors, including pMUW1530.

Plasmid pMUW1410 was constructed using two synthetic oligonucleotides GA179 (SEQ ID NO: 7) and GA181 (SEQ ID NO: 8) as primers to amplify the required pGEM3Z sequence in a polymerase chain reaction (PCR). The two oligonucleotide primers were each designed as two sections, the 5' end of the sequences containing restriction sites required for cloning and the 3' end of the sequences specifically matching the sequence of the plasmid pGEM3Z. The 3' end of the oligonucleotide GA179 (SEQ ID NO: 7) is the same as the pGEM3Z nucleotides 452-472 bp (Promega Corp. numbering system) while the 3' end of oligonucleotide GA181 (SEQ ID NO: 8) is complementary to pGEM3Z nucleotides 2254-2240 bp, i.e. they prime opposite strands of the pGEM3Z DNA during the PCR reaction.

The PCR reaction was carried out using 10 ng of pGEM3Z cut with restriction enzyme PvuII to linearized the plasmid, 20 pico moles of each oligonucleotide, 0.03 mM of each of the four deoxynuclotide triphosphates dATP, dTTP, dCTP and dGTP, Taq polymerase buffer (Biores) to a final volume of 50 ul and 1.25 units of Taq polymerase (Biores). The reaction was carried out for eight cycles using 120 second incubations at 95 degrees to denature, 50 degrees to anneal and 72 degrees for the extension reaction. The polymerase was removed from product of the PCR reaction by extracting with phenol, then chloroform and the DNA precipitated with ethanol at -20 degrees. The product of the PCR (which consisted of the pGEM3Z sequence 452-2254 bp flanked by the sequences of the two oligonucleotides GA179 (SEQ ID NO: 7) and GA181) (SEQ ID NO: 8) was then digested with the restriction enzyme BamHI to cleave the BamHI sites at the 5' end of the two oligonucleotides, and then the enzyme removed by extraction with 50% phenol/chloroform, chloroform and then the DNA was precipitated with three volumes of ethanol at -70 degrees. Finally, the DNA product of the PCR reaction was self ligated using the BamHI sticky ends to form intact plasmids and the plasmids transformed into the E. coli strain Dh5a(Bethesda Research Laboratories) by electroporation using the procedures recommended by Biorad, the manufacturer of the "Gene pulser" equipment. The transformed cells were spread onto LB agar containing 100 ug ampicillin per ml. E. coli clones resistant to ampicillin were selected, their plasmids (e.g. pMUW1410) prepared by alkaline lysis and checked for size and the desired pattern of restriction enzyme sites using agar electrophoresis.

The plasmid pMUW1410 was approximately 1.8 Kb in size as expected for the desired portion of pGEM3Z (452-2254 bP) containing the pBR332 origin of replication and the ampicillin gene. Indeed, the ability of the E. coli clone containing pMUW1410 to replicate on ampicillin agar means the plasmid must contain a functional origin of replication and the ampicillin resistance gene. pMUW1410 also contains restriction sites for ClaI, BamHI and NheI derived from the synthetic oligonucleotides. The sequence of the plasmid pMUW1410 in the region of the BamHI site was confirmed using a T7 polymerase sequencing kit (Pharmacia) and a synthetic oligonucleotide GA220 (SEQ ID NO: 9) which is designed to anneal to the ampicillin gene (2149-2164 bp, pGEM3Z numbering) so that the sequencing reaction covers the sequence derived from the oligonucleotides GA179 (SEQ ID NO: 7) and GA181 (SEQ ID NO: 8). The sequencing reaction confirmed that the oligonucleotides GA179 and GA181 used to create pMUW1410 had in fact bound to the expected positions in pGEM3Z and excludes the possibility of errors due to miss-priming at any other position.

pMUW1530

Shuttle vector pMUW1530 was constructed by inserting the XbaI--ClaI fragment (1175-1775 bp) of Ddp2 containing the origin of replication into the NheI and ClaI sites of plasmid pMUW1410.

Plasmid pMUW1015 containing the large AluI (1155-3223 bp) fragment of Ddp2 was used as the source of the Ddp2 origin of replication. 10 ug of pMUW1015 was digested with XbaI and EcoRI restriction enzymes and a 1.2 Kb DNA fragment (i.e. 1175-2436 bp of Ddp2) isolated by agarose gel purification. The appropriate DNA band was excised from the electrophoresis gel and frozen to disrupt the gel matrix. The DNA was extracted using the centrifugation methods of Heery et al ((1990) TIG 6,173.) and then phenol/chloroform extracted and ethanol precipitated to remove traces of the ethidium bromide stain. The DNA was further digested with the ClaI restriction enzyme and the 0.6 Kb XbaI--ClaI fragment (1175-1775 bp, Ddp2 numbering) gel purified as described above.

Plasmid pMUW1410 was digested with the restriction enzyme NheI and subsequently with enzyme ClaI, since the NheI site is too close to the ClaI site to cut efficiently after the ClaI enzyme has cut. The digestion was then dephosphorylated by adding 1/40th volume of 20% SDS, 1/6th volume of 1M Tris buffer pH 9.0 and then 1 unit of Calf intestinal alkaline phosphatase (Boehringer) and incubating at 37 degrees for one hour. The enzyme was then removed by extracting with 50% phenol/chloroform followed by chloroform extraction and then the DNA precipitated with ammonium acetate and two volumes of ethanol.

The XbaI--ClaI fragment from plasmid pMUW1015 (i.e. the Ddp2 origin of replication) prepared above was ligated into the plasmid pMUW1415 (cut with NheI and ClaI and treated with alkaline phosphatase), transformed into the E. coli strain "Sure" (Statagene) and plated onto LB agar containing 100 ug ampicillin per ml. E. coli clones resistant to ampicillin were selected, their plasmids (e.g. pMUW1530) prepared by alkaline lysis and checked for size and the desired pattern of restriction enzyme sites using agar electrophoresis.

Plasmid pMUW1530 is a 2.4 Kb shuttle plasmid containing the Ddp2 origin of replication inserted into the NheI and ClaI sites of plasmid pMUW1410. Evidence confirming this includes the presence of the BglII and NdeI sites from the Ddp2 origin of replication at the expected distance from the BamHI and ClaI sites found in pMUW1410. pMUW1530 does not contain the XbaI or NheI restriction sites used for cloning since the compatible "sticky ends" were destroyed by the ligation.

5 ug of pMUW1530 mixed with 5 ug of plasmid pMUW110 was then used to transform D. discoideum axenic strain, AX3K, using the standard calcium phosphate precipitation procedure with G418 selection. G418 resistant transformants were screened by "lysis in a gel", southern blotting onto Zeta-probe membrane and probed with ³² P labelled pGEM3Z. This demonstrated the presence of an extrachromosomal plasmid with the size of plasmid pMUW1530 containing pGEM3Z DNA sequences.

pMUW1570

Shuttle vector pMUW1570 is the same as pMUW1530, but with the NdeI restriction site removed to allow NdeI to be used for the manipulation of genes cloned into the plasmid.

Plasmid pMUW1530 was digested with the NdeI restriction enzyme in 11 ul of 10 mM Tris buffer pH 7.5, 10 mM MgCl and 50 mM NaCl. The ends of the DNA were then filled by simply adding 1 unit of T7 polymerase and 3 ul of the "C long" mix of deoxynucleotides supplied with the Pharmacia T7 polymerase sequencing kit and incubating at room temperature for five minutes. The plasmid was then religated by adding 2 ul ligation buffer (Boehringer), adjusting the volume to 20 ul by adding water and 1 unit of T4 ligase and then incubating at 4 degrees overnight. The religated plasmid was transformed into the E. coli strain "Sure" (Statagene) and plated onto LB agar containing 100 ug ampicillin per ml. E. coli clones resistant to ampicillin were selected, their plasmids (e.g. pMUW1570) prepared by alkaline lysis and checked for size and the absence of the NdeI restriction site.

pMUW1580

Shuttle vector as pMUW1580 is the same pMUW1570, but with the BglII restriction site removed to allow BglII to be used for the manipulation of genes cloned into the plasmid.

Plasmid pMUW1530 was digested with the NdeI restriction enzyme, end filled with T7 polymerase, self ligated and transformed into E. coli using the same procedures as for pMUW1570. E. coli clones resistant to ampicillin were selected, their plasmids (e.g. pMUW1580) prepared by alkaline lysis and checked for size and the absence of the BglII restriction site.

Plasmid pMUW1580 contains a second ClaI site created by end filling the BglII site. However, in most strains of E. coli this sequence is methylated so that the ClaI enzyme will not cut the new ClaI site.

5 ug of pMUW1580 was mixed with 5 ug of plasmid pMUW110 and used to transform the D. discoideum axenic strain, AX3K, using the standard calcium phosphate precipitation procedure with G418 selection. G418 resistant transformants were screened by "lysis in a gel", southern blotting onto Zeta-probe membrane and probed with ³² P labelled pGEM3Z. This demonstrated an extrachromosomal plasmid with the size of plasmid pMUW1580 containing pGEM3Z DNA sequences. Thus, plasmid pMUW1580 is a small, 2.4 Kb shuttle vector containing the minimum number of six base restriction sites, which is particularly suitable for use in the construction of expression vectors.

EXAMPLE 4 Construction of an Expression Cassette

An "expression cassette" is a single, easily cloned piece of DNA which contains in their correct relative positions all the sequences required to ensure expression of a gene and the correct processing of the messenger RNA and protein product. Usually the cassette contains a number of restriction sites (polylinker) behind the promoter in a good position for inserting the gene to be expressed. The use of a well designed expression cassette greatly facilitates the expression of a range of genes and is much preferred to the alternative of cloning all the necessary DNA sequences on an adhoc basis.

We have designed a novel expression cassette specifically for insertion into the BamHI site of the shuttle vectors pMUW1530, pMUW1570 and pMUW1580. The expression cassette is designed to minimize the amount of unnecessary DNA sequences and restriction sites. This was achieved by cloning the required control and signal sequences using PCR techniques to insert at key positions the restriction sites required for cloning, using sites that can be destroyed during the construction procedure. The cassette contains a promoter from the D. discoideum actin 15 gene, a sequence coding for a secretion signal peptide, a polylinker containing restriction sites allowing the insertion of genes for expression and a polyadenylation signal sequence from the D. discoideum actin 15 gene.

Each component section of the expression cassette was cloned separately and then assembled into the complete cassette inside the polylinker of pGEM3Z.

Cloning the Actin 15 Promoter, Plasmid pMUW1480

The actin 15 promoter was selected because it is well characterised and is known to be expressed at a relatively high level soon after the onset of starvation (Cohen et al (1986) EMBO J. 5, 3361-3366). For the purpose of the production of recombinant proteins, this pattern of expression is desirable to avoid the protein being produced during active growth where the resulting metabolic drain may cause a selective advantage for any non-secreting mutants.

The two synthetic oligonucleotides GA190 (SEQ ID NO: 10) and GA188 (SEQ ID NO: 11) were used as primers to amplify the required actin 15 promoter sequence in a polymerase chain reaction (PCR). The two oligonucleotide primers were each designed as two sections, the 5' end of the sequences containing restriction sites required for cloning and the 3' end of the sequences specifically matching the sequence of the Actin 15 gene in plasmid pTS1 (Chang et al (1989) Nucleic Acids Res. 17, 3655-3661). The 3' ends of the oligonucleotide GA190 (SEQ ID NO: 10) is the same as the promoter nucleotides between -247 and -230 (numbering back from A of the ATG start codon) while the 3' end of oligonucleotide GA188 (SEQ ID NO: 11) is complementary to nucleotides between +3 and -13, i.e. they prime opposite strands of the actin 15 DNA during the PCR reaction.

The PCR reaction was carried out using 30 ng of pTS1 cut with restriction enzymes PvuII and ScaI to ensure the plasmid is unable to replicate during later cloning steps, 20p moles of each oligonucleotide, 0.03 mM of each of the four deoxynucleotide triphosphates dATP, dTTP, dCTP and dGTP, Taq polymerase buffer (Biores) to a final volume of 50 ul and 1.25 units of Taq polymerase (Biores). The reaction was carried out for ten cycles using 120 second incubations at 95 degrees to denature, 40 degrees to anneal and 72 degrees for the extension reaction. At the end of the PCR reaction, 1 unit of T4 polymerase was added and incubated at room temperature for 15 minutes to ensure the ends of the DNA were blunt. 20 ug of glycogen in 1 ul (Boehringer) was added and 2 ul of acetate buffer (to aid precipitation of the small DNA fragments) before the polymerases were removed by extracting with 50% phenol in choroform, then chloroform and the DNA precipitated with three volumes of ethanol at -70 degrees.

The product of the PCR reaction (consisting of the Actin 15 promoter sequence between -247 and +3 relative to the start codon flanked by the sequences of the two oligonucleotides GA190 (SEQ ID NO: 10) and GA188 (SEQ ID NO: 11)) was shown to have the expected size of approximately 300 bp by electrophoresis in 1.6% agarose against size markers (BRESA) of phage SPP-1 digested with the restriction enzyme EcoRI.

The DNA product of the PCR reaction was mixed with 100 ng of pGEM3Z which had been cut with the restriction enzyme SmaI to create blunt ends. The mixture was ligated with 3 units of T4 ligase in ligation buffer for two hours at room temperature and then precipitated with ammonium acetate and two volumes of ethanol. The religated plasmids were transformed into the E. coli strain Dh5a (Bethesda Research Laboratories) by electropotation using the procedures recommended by Biorad, the manufacturer of the "gene pulser" equipment. The transformed cells were plated onto LB agar containing 100 ug ampicillin per ml, 0.5 mM IPTG (isopropyl-B-d-thiogalactopyanoside) and 50 ug X-Gal (5-bromo-4-chloro-3-indolyl-B-galactoside) per ml. E. coli clones resistant to ampicillin and producing large white colonies (indicating the plasmid has DNA inserted into the polylinker) were selected, their plasmids (e.g. pMUW1480) prepared by alkaline lysis and checked for size and the desired pattern of restriction enzyme sites using agar electrophoresis.

The plasmid pMUW1480 digested by the restriction enzyme PvuII produced a fragment with approximately of 700 bp, comprised of 379 bp of pGEM3Z sequences containing an approximately 300 bp insert, as expected for the desired actin 15 promoter (250 bp) flanked by the sequences derived from the synthetic oligonucleotides GA190 (SEQ ID NO: 10) and GA188 (SEQ ID NO: 11). pMUW1480 also contains restriction sites for HindIII, BglII, NsiI and FokI derived from the synthetic oligonucleotides. The identity of the promoter inserted into plasmid pMUW1480 was confirmed by sequencing using a T7 polymerase sequencing kit (Pharmacia) and commercially supplied oligonucleotides (Promega) which anneal to SP6 and T7 regions flanking the polylinker. The sequencing excludes any possibility of errors in the sequence.

Cloning a Sequence for a Secretion Signal, Plasmid pMUW1450

Secretion of a protein requires a signal sequence at the amino terminal end of the polypeptide. This signal peptide is the first part of the protein to be transcribed and causes the ribosome to bind to the endoplasmic reticulum membranes and feed the nascent polypeptide across the membrane into the lumen of endoplasmic reticulum. Subsequently, the signal peptide is cleaved from the rest of the protein.

The D. discoideum protein PsA possesses a 20 amino acid signal peptide which has characteristics typical of many eukaryotic signal peptides (Perlman & Halvorson (1983) J. Mol. Bio. 167, 391-409) and so it should be a reliable signal to use of the secretion of recombinant proteins.

The two synthetic oligonucleotides GA187 (SEQ ID NO: 12) and GA182 (SEQ ID NO: 13) were used as primers in a PCR reaction to amplify the DNA sequence coding for the PsA signal peptide from the D19 gene encoding the PsA protein (Early et al (1988) Mol. Cell. Biol. 8, 3458-3466). The methods used were the same as described for cloning the actin 15 promoter (see above). However, some difficulty occurred in cloning the correct product of the PCR reaction.

The plasmid pMUW1450 gave the correct size fragment when digested by the restriction enzyme PvuII, but when the insert was sequenced it was found that the oligonucleotide GA182 (SEQ ID NO: 13) had not annealed to the D19 DNA in the anticipated position at the 3' end of the signal sequence. The DNA cloned in plasmid pMUW1450 contained the first oligonucleotide GA187 (SEQ ID NO: 12) in the correct position 5' to the D19 start codon, but the DNA sequence continued past the end of the signal peptide as far as the PvuII site near the center of the gene. Investigation of the reason for the failure of the oligonucleotide GA182 to prime the PCR reaction correctly established that this sequence forms a hair pin loop, so it was unlikely to be available for binding to the D19 gene.

An alternative approach to modifying the 3' end the DNA coding for the PsA signal peptide is described below.

Fusion of the Promoter with the D19 (PsA) gene, Plasmid pMUW1545

Plasmid pMUW1450 contains the restriction sites derived from oligonucleotide GA187 that are required for the promoter and D19 gene sequences to be fused. This required a three way ligation to force clone the two DNA fragments into the NdeI and HindIII sites of pGEM3Z.

The DNA fragments to be fused were prepared by cutting 5 ug of plasmid pMUW1450 with the restriction enzymes NdeI and ScaI and then purifying the largest (1.8 Kb) DNA fragment containing the D19 sequences by gel purification as described previously. The NdeI cleavage site at the end of this fragment occurs within the D19 sequence coding for the signal peptide. The promoter sequence was prepared by cutting 5 ug of plasmid pMUW1480 with the HindIII and EcoRI restriction enzymes, which cut the HindIII site in oligonucleotide GA190 (SEQ ID NO: 10) derived sequence 5' to the promoter and the EcoRI site in the polylinker, yielding a 0.3 Kb fragment which was then purified by gel electrophoresis. The DNA fragments containing the D19 and promoter fragments were mixed together and digested with the FokI restriction enzyme which creates compatible ends at the ATG start codons in both sequences. The FokI digested fragments were extracted with 50% phenol in chloroform, then chloroform and then precipitated with three volumes of ethanol at -70 degrees. The FokI fragments were ligated with 0.5 ug of pGEM3Z which has been cut with HindIII and NdeI, treated with alkaline phosphatase and purified by gel electrophoresis as described for plasmid pMUW1410. The religated plasmids were transformed into the E. coli "Sure" strain as described above and plated onto LB agar containing 100 ug ampicillin per ml. E. coli clones resistant to ampicillin were selected, their plasmids (e.g. pMUW1545) prepared by alkaline lysis and checked for the presence of the BglII restriction site from the oligonucleotide GA190 (SEQ ID NO: 10), and the absence of the NsiI restriction sites that should have been removed from both of the inserted fragments.

Construction of the Full Promoter/Signal Sequence, pMUW1594

In order to replace the 3' end of the D19 sequence coding for the signal peptide a synthetic DNA sequence was cloned into the NdeI restriction site of plasmid pMUW1545. The synthetic DNA sequence is composed of two synthetic oligonucleotides GA297 (SEQ ID NO: 14) (+ve strand) and a complementary sequence GA296 (SEQ ID NO: 15) (-ve stand) which anneal to form double stranded DNA with ends compatible with the NdeI restriction site. In designing these oligonucleotides, the opportunity was taken to change the DNA sequence to optimize the codon usage for highly expressed genes, remove the potential to form hair pin loops and to remove the NdeI restriction site used to insert the oligonucleotides, leaving a single NdeI site suitable for cloning at the signal peptide cleavage site. The DNA sequence changes do not alter the encoded amino acid sequence of the signal peptide.

The oligonucleotide GA297 (SEQ ID NO: 14) and GA296 (SEQ ID NO: 15) were phosphorylated with T4 kinase. 50p moles of each oligonucleotides in 50 ul "One-for-all" buffer (Pharmacia) and 2 uM dATP was incubated with 20 units T4 ligase at 37 degrees for 30 minutes and then the enzyme destroyed by heating to 100 degrees.

Plasmid pMUW1545 was linearized with NdeI restriction enzyme and ligated with 1 p mole of phosphorylated oligonucleotides GA297 and GA296 using 0.5 units of T4 ligase at 4 degrees overnight. The religated plasmids were transformed into the E. coli strain "Sure" (Statagene) and after one hour incubation at 37 degrees the organisms were inoculated into 500 ml LB broth containing 100 ug ampicillin per ml. After being shaken for 18 hours at 37 degrees, the cells were harvested and the mixed population of plasmids purified by alkaline lysis. 5 ug of the resulting mixture of plasmids were digested with the Hind III restriction enzyme and an approximately 0.3 Kb fragment of DNA purified by gel electrophoresis as described above. This 0.3 Kb fragment of DNA could only come from plasmids that are cut in the polylinker and also have the synthetic DNA sequence (which contains a second HindIII site) inserted into the NdeI restriction site of pMUW1545. Thus, this 0.3 Kb fragment contains the full promoter signal sequence construct.

The 0.3 Kb promoter--signal sequence was ligated into pGEM3Z that had been cut with HindIII, treated with alkaline phosphatase and purified by gel electrophoresis. The religated plasmid were transformed into the E. coli strain "Sure" (Statagene) and plated onto LB agar containing 100 ug ampicillin per ml. E. coli clones resistant to ampicillin were selected and their plasmids (e.g. pMUW1594) prepared by alkaline lysis. Plasmids were checked for size and the correct orientation of the promoter (i.e. 5' to the polylinker) using the position of the BglII site 5' to the promoter. Clones were further screened by T7 polymerase sequencing (Pharmacia) using oligonucleotide GA187 (SEQ ID NO: 12) to check the orientation of the inserted synthetic DNA sequence. Plasmid pMUW1594 had the required promoter and signal sequence in frame with the pGEM3Z polylinker encoded lac operon sequences.

Cloning the Actin 15 Polyadenylation Signal, Plasmid pMUW1512

The two synthetic oligonucleotides GA189 (SEQ ID NO: 16) and GA186 (SEQ ID NO: 17) were used as primers to amplify the actin 15 polyadenylation sequence in a polymerase chain reaction (PCR). The two oligonucleotide primers were each designed as two sections, the 5' end of the sequences containing restrictions sites required for cloning and the 3' end of the sequences specially matching the sequence of the Actin 15 gene in plasmid pTS1 (Chang et al (1989) Nucleic Acids Res. 17, 3655-3661). The 3' end of the oligonucleotide GA189 is designed to bind at the stop codon of the actin 15 gene and has one extra base pair added to the original sequence in order to place stop codons in all three reading frames, while the 3' end of oligonucleotide GA186 (SEQ ID NO: 17) is complementary to the sequence approximately 305 bp 3', immediately preceding a EcoRV restriction site. The oligonucleotide GA186 replaces the EcoRV restriction site with BglII and EcoRI site for use in cloning.

The PCR amplification of the polyadenylation sequence was carried out using the identical DNA preparations and methods to the cloning of the actin 15 promoter described above, apart from the use of a different pair of oligonucleotides and the transformation of the plasmids into the "Sure" strain of E. coli. The plasmids produced (e.g. pMUW1512) were digested by the restriction enzyme PvuII and screened for the presence of a fragment of approximately 800 bp, comprised of 379 bp of pGEM3Z sequences containing an approximately 400 bp insert. The plasmids were further digested with the restriction enzymes BglII, EcoRI and KpnI (separately) to check for the presence of the restriction sites from the two oligonucleotides. Plasmids pMUW1512 and pMUW1515 (opposite orientations of the insert) were sequenced to confirm the polyadenylation signal contained no errors using a T7 polymerase sequencing kit (Pharmacia) and commercially supplied oligonucleotides (Promega) which anneal to SP6 and T7 regions flanking the polylinker.

5 ug of plasmid pMUW1512 was digested with KpnI and subsequently with EcoRI restriction enzymes and an approximately 0.4 Kb fragment containing the polyadenylation signal purified by gel electrophoresis as described previously. This 0.4 Kb fragment was ligated into 1 ug of plasmid pGEM32 which was also digested with KpnI and EcoRI, treated with alkaline phosphatase and then purified by gel electrophoresis. The plasmids were transformed into E. coli strain "Sure" plated onto LB agar containing ampicillin as described previously. Plasmids (e.g. pMUW1560) from the ampicillin resistant clones were screened for the correct sized insert (0.4 Kb) and the presence of a BglII site derived from oligonucleotide GA186 (SEQ ID NO: 17). Plasmid pMUW1560 contains the actin 15 polyadenylation signal in the correct position and orientation for the final expression cassette.

Construction of the Complete Expression Cassette, Plasmid pMUW1621

The expression cassette was completed in a single cloning step combining the fused promoter/signal sequence from plasmid pMUW1594 with the polyadenylation sequence from plasmid pMUW1560.

Plasmid pMUW1560 was digested with the restriction enzymes SalI and ScaI and the smaller 1.2 kG Kb fragment containing the polyadenylation signal purified by gel electrophoresis as previously described. Plasmid pMUW1594 was also digested with SalI and ScaI enzymes and the larger 2 Kb fragment containing the promoter and signal sequence purified by gel electrophoresis. The two fragments were pooled, ligated and transformed into the "Sure" strain of E. coli. The identity of the isolated plasmids (e.g. pMUW1621) was confirmed by cutting with the restriction enzyme BglII to produce a 0.7 Kb fragment. This fragment can only be produced by plasmids containing two BglII sites, one derived for the oligonucleotide GA190 (SEQ ID NO: 10) used to clone the promoter and the second derived from oligonucleotide GA186 (SEQ ID NO: 17) used to clone the polyadenylation signal.

Insertion of the Expression Cassette into the Shuttle Vector

The shuttle vector pMUW1580 was linearized using restriction enzyme BamHI, treated with alkaline phosphatase and purified by gel electrophoresis as previously described. The expression cassette in the 0.7 Kb BglII fragment from plasmid pMUW1621 was also purified by gel electrophoresis and ligated into the linearized plasmid pMUW1580. The ends of the DNA fragments produced by the BglII and BamHI enzymes are compatible, so both restriction sites are destroyed in the ligation. The resulting plasmids produced in the E. coli "Sure" strain were digested with ClaI and HindIII enzymes to screen for the presence of the polylinker in the expression cassette and the orientation of the expression cassette in the plasmid. Plasmids pMUW1630 and 1633 had the opposite orientations of the expression cassette.

Insertion of the GUS Gene into the Expression Vector

The GUS gene is the E. coli gene for the enzyme B-glucuronidase which has been modified by the insertion of SalI and NcoI restriction enzyme sites at the start codon of the gene, an EcoRI site at the 3' end of the gene and a BamHI site removed from the center of the gene (Jefferson et al (1986) PNAS 83, 8447). Plasmid pRAJ275 containing this construct was purchased from Clontech Laboratories Inc., California, USA.

In order that the GUS gene could be easily sequenced, it was inserted into pGEM3Z. The GUS gene was cut out of plasmid pRAJ175 with the restriction enzymes SalI and EcoRI, purified by gel electrophoresis and ligated into plasmid pGEM3Z which had been cut with the same enzymes, treated with alkaline phosphatase and gel purified. The plasmid with the GUS gene inserted was pMUW1550.

A SmaI restriction site was inserted into the EcoRI site of plasmid pMUW1550 using oligonucleotide GA310 (SEQ ID NO: 18) as a linker. Oligonucleotide GA310 was phosphorylated as previously described in the section on the construction of the full promoter/signal sequence. 1 pmole of phosphorylated GA310 (SEQ ID NO: 18) was mixed with 0.5 ug of plasmid pMUW1550 which had been cut with the EcoRI restriction enzyme and purified by gel electrophoresis. The mixture was ligated at 4 degrees overnight and then transformed into the E. coli "Sure" strain. The transformants were incubated for one hour in SOC medium and then inoculated into 50 ml of LB broth containing 100 ug ampicillin per ml. After shaking at 37 degrees for 18 hours the cells were harvested, plasmids purified and cut with the SmaI restriction enzyme. Only the plasmids containing the oligonucleotide GA310 (SEQ ID NO: 18) contain a SmaI site, so the linearized plasmids were purified by gel electrophoresis, religated and transformed back into the E. coli strain "Sure".

1 ug of plasmid pMUW1558 containing a the GUS gene with the SmaI restriction site inserted into the EcoRI site at the 3' end of the gene was cut with the restriction enzymes SalI and SmaI and the 1.9 Kb gene purified by gel electrophoresis. The polylinker in the expression vector pMUW1630 was also cut with the restriction enzymes SalI and SmaI, treated with alkaline phosphatase and purified by gel electrophoresis. The two purified DNA fragments were ligated, transformed into the "Sure" strain of E. coli and plasmids purified from ampicillin resistant clones. Plasmid pMUW1653 contained the GUS gene cloned in frame into the SalI site of the expression vector. This was confirmed by restriction mapping using the sites for NcoI and EcoRI enzymes at the 5' and 3' ends of the GUS Gene respectively. The region of the fusion between the sequence encoding the secretion signal and the 5' end of the GUS gene sequencing plasmid was confirmed by DNA sequencing using a T7 polymerase kit (Pharmacia) and oligonucleotide GA187 (SEQ ID NO: 12).

Expression of the GUS gene in D. discoideum

The suitability of the expression vector for the expression of recombinant genes was confirmed by transforming 5 ug of the expression plasmid pMUW1653 (containing the E. coli B-glucuronidase gene) and 5 ug of plasmid pMUW110 (containing the Ddp2 Rep gene and a G418 resistance marker) into D. discoideum strain AX3K using the calcium phosphate precipitation procedure described previously. After one week under G418 selection, the culture supernatant from the transformants was tested for the presence of the GUS enzyme activity using 1 mM p-nitrophenyl-B-D glucuronide substrate in 50 mM sodium phosphate pH7.0, 10 mM 2-mercaptoethanol and 0.1% Triton X-100. A green colouration indicated the presence of the enzyme B-glucuronidase secreted from D. discoideum. Culture supernatants from cells transformed with the expression vector pMUW1630 did not contain B-glucuronidase.

It will be appreciated by persons skilled in the art that numerous variations and/or modifications may be made to the invention as shown in the specific embodiments without departing from the spirit or scope of the invention as broadly described. The present embodiments are, therefore, to be considered in all respects as illustrative and not restrictive.

    __________________________________________________________________________     SEQUENCE LISTING                                                               (1) GENERAL INFORMATION:                                                       (iii) NUMBER OF SEQUENCES: 19                                                  (2) INFORMATION FOR SEQ ID NO:1:                                               (i) SEQUENCE CHARACTERISTICS:                                                  (A) LENGTH: 10 base pairs                                                      (B) TYPE: nucleic acid                                                         (C) STRANDEDNESS: single                                                       (D) TOPOLOGY: linear                                                           (ii) MOLECULE TYPE: DNA (genomic)                                              (iv) ANTI-SENSE: NO                                                            (xi) SEQUENCE DESCRIPTION: SEQ ID NO:1:                                        TGTCATGACA10                                                                   (2) INFORMATION FOR SEQ ID NO:2:                                               (i) SEQUENCE CHARACTERISTICS:                                                  (A) LENGTH: 5852 base pairs                                                    (B) TYPE: nucleic acid                                                         (C) STRANDEDNESS: single                                                       (D) TOPOLOGY: linear                                                           (ii) MOLECULE TYPE: DNA (genomic)                                              (iv) ANTI-SENSE: NO                                                            (ix) FEATURE:                                                                  (A) NAME/KEY: CDS                                                              (B) LOCATION: 2378..5038                                                       (ix) FEATURE:                                                                  (A) NAME/KEY: CDS                                                              (B) LOCATION: 2378..5038                                                       (xi) SEQUENCE DESCRIPTION: SEQ ID NO:2:                                        TCGACAAATATCAAGGGTTGGAATCTTGTAAAAATTTTCCCGTTATCGCAAACAATCAAA60                 GTTTAAGCTTCA ATCTTCAATAATAATTTTAACTTTATCTCTTTCAATTTTAATAATTTT120               TTTCAAAAATTGAAAATGGTATAGATCGATAGATCACCTTTTTTAGAGATAAACCATGAA180                AAAGACATAAAAAATAAAGGTCATCAAAGTATTAAAAAAAATTAATTATCTTTTTAAC TT240               TGAAAAAAAAAAATAAAAAAAAATAAAAAAAAAAAATTCTTTGTTTTAATAACTTTTAAA300                ATTATTAAAAATAGTATAGATTTAAAGATCACAATTTTTTATAATTAACTACATAAAATT360                TATAAAAAATGAGGGTCATGAAGATATATAAATAA TTATTTAATTATTAAATATTTAATT420               ATTTATTTAACTTAAAAAAAAAAAAAAGGAAAAAAAGGAAAAAAAAAGTGAAAAAGGTGG480                GAAAATGAAAAAAAAAGTGAAAAAAATGCCCAAAAAAATTTTTATATGAGAAAAAAATTA540                CGTAAAAAAAAA ATAAGTCTGACCCAAATCGAAAAATAATAAAAGAGGGGAAAGTAATTA600               TAACTAGGTTAGTTTTTTATAATTTTTACATATTTGTTAATAACTTTTAATTTTGAATCA660                TATGATATTACATCGTCCCGTTGAAAAAAAAAAAAAAAATTTTTTTTTCAAACATTTT CA720               TTTTTTAAAAAATGATATAAAATTTTAAACTAAACTATTTTATTAAATACAAATATATAA780                CTTTATCTTAATCAATTTTTTTGGTTTATACATATTTATGTTCGTACTGAAGTATAGATC840                TTATTACTAAAGTTTCAAAAGTTTTAAAAAAAATT AAAGGGGGTAAATATATAACTTTCT900               GTTTTTTTCAATTCTGTCATGACAGAAAGGTAAAAAGTGTCATGACAAAAAAAAAAAAAA960                AAAAAATTTATTTCTTCAATAGGTATTGAAATGACCTCCGTTTTTAATAAAAAGTATATA1020               TTTGTGCTTTCC TAGATGAAATAAGGTTATTTGAGCTTAATTCAGATTATTATAAGATTA1080              TTATAAAAAAATGAAAAACTGTCATGACAGTTTTTGTAAGTTTCTTATAGTTTTTTTTAA1140               TGATCTGAATTAAGCTTAAATAACCTTATTTCATCTAGACGAGCACAAATATATACTT TT1200              TATTAAAAACGGAGGTCATTTCAATACCTATTGAAGAAATAAATTTTTTTTTTTTTTTTT1260               TTTGTCATGACACTTTTTTTTTTTTGTCATGACAGAATTGAAAAAAACAGAAAGTTATAT1320               ATTTACCCCCTTTAATTTTTTTTAAAACTTTTGAA ACTTTAGTAATAAGATCTATACTTC1380              AGTACGAACATAAATATGTATAAACCAAAAAAATTGATTAAGATAAAGTTATATGTTTGT1440               ATTTAATAAAATAGTTTAGTTTAAAATTTTATATCATTTTTTAAAAAATGAAAATGTTTG1500               AAAAAAAAAATT TTTTTTTTTTTTTTCAACGGGACGATGTAATATCATATGATTCAAAAT1560              TAAAAGTTATTAACAAATATGTAAAAATTATAAAAAACTAACCTAGTTATAATTACTTTC1620               CCCTCTTTTTTTTTTTTTTTTTTGTCATGACACTTTTTTTTTTTTGTCATGACACTTT TT1680              TTTTAAAAAAAAAAAAAAAAATGTTAAAATACTATTTGATGACATTCATTTTTCCTAGTT1740               TTTTTTTAGATAGATATAAAAATAAATTGCCTATCGATATATACTTAATTTATTAAGATT1800               GAATAATATTTTAATTTTTAATAAATTCTACTTTT TTTTTTTTTTTCTTTTTTTTTTAAA1860              TTTTAAAATTTTTTTTTTTTATTAGATCTCATAATTAAAAATCAATTTAAAATTAAAAGT1920               TATTTTTAAATATGCAAAAACTATAAAAAACTAATGTAGTTTAACCAACTTTTTTCTATT1980               TCTTTTTTTTTT TTTTTTTTTTTTTTACTTTGAAAAAAAAAAAAAAAAAAAAAAAAAAAA2040              AAACCCTCATTATAAATATTAATTACTTTGGTTTTTTTTGATTTTTTTTTTAATAAATTT2100               AAAATTTTATTCTCTATCTAATTATACCTTATTTATAAATATTGGATAATATATCAAA TA2160              TTTATCAGTTTTGGCATGACAATTTTAATTATATTTATTTTTTGATTAATTTTTTTTTTT2220               TTTTTTTTTTAAAATTTCTTTTTTTTTTTTTTTATTTTTAATTTTTAATTTTTATTTTTC2280               CCACACTTTCATTTTATTTTATTTTATTTATTGTA AATTCATTTTATTTATTTTTAATTA2340              AATAGTTTTGGTTTAATTTTATTCAAAGATTTTAAAAATGGACGAACTTATTTCT2395                    MetAspGluLeuIleSer                                                              15                                                                            TGGGATAGGTTTTTTAAGTTTTTTGTAATACTTTTGGAAGAATTCAAA2443                           TrpAspArgPhePheLysPhePheValIleLeuLeuGluGluPheLys                               10 1520                                                                        GGTTGTAAAAGAAATGATGTGCGTTTGAGTGTCGATTATGACATTCTT2491                           GlyCysLysArgAsnAspValArgLeuSerValAspTyrAspIleLeu                               25 3035                                                                        TCTGGTATTTATTCGCCACGTACATTTGTACTAAAGGAAGTCTTTAGA2539                           SerGlyIleTyrSerProArgThrPheValLeuLysGluValPheArg                               4045 50                                                                        GCAGTGGCCGTCTCTTATGATGAATCTGAAATAGATTTATTCAGATTG2587                           AlaValAlaValSerTyrAspGluSerGluIleAspLeuPheArgLeu                               5560 6570                                                                      GGTTCAGTGTTTCCTGGTACTTCTTTATATTCATATATTCCAGGTATT2635                           GlySerValPheProGlyThrSerLeuTyrSerTyrIleProGlyIle                               75 8085                                                                        TTCAGTTTAAAAGATTTCCTTTTAATTTCAAAAACTAAATCGGGTAAA2683                           PheSerLeuLysAspPheLeuLeuIleSerLysThrLysSerGlyLys                               9095 100                                                                       ATAAGAGTTTCGGATGTAGATCAAGCAATATTAATTTTTGATCATTTT2731                           IleArgValSerAspValAspGlnAlaIleLeuIlePheAspHisPhe                               105110 115                                                                     TCTAGAATTTCAGATAAACAAGTATTTCGTAAAGATATTATTCCAGGT2779                           SerArgIleSerAspLysGlnValPheArgLysAspIleIleProGly                               1201251 30                                                                     TATAGAACCTTTGAAAAATCAATATCGAGCGAGTACAAAATCTCGGAT2827                           TyrArgThrPheGluLysSerIleSerSerGluTyrLysIleSerAsp                               135140145 150                                                                  GGTCGTGCTGCAGGAGTGAGTTGGTTCAATTTAGTTAGTAAAATAAGC2875                           GlyArgAlaAlaGlyValSerTrpPheAsnLeuValSerLysIleSer                               155160 165                                                                     ACTTATTGTAAAAATCATCCCTTGTTTGCCGAAAATCCAACATATAAA2923                           ThrTyrCysLysAsnHisProLeuPheAlaGluAsnProThrTyrLys                               170175 180                                                                     CATGTGGATTTTATATCAATGTTATCACTGGTGCATGGAATCATTGTT2971                           HisValAspPheIleSerMetLeuSerLeuValHisGlyIleIleVal                               185190195                                                                       GATTCCCAAAATGAAGATGAGAATAATGTTTCGGCAATGTACTCTCTG3019                          AspSerGlnAsnGluAspGluAsnAsnValSerAlaMetTyrSerLeu                               200205210                                                                      AATCCTTT TGTGGATCTTGAAAAAAGTGATATACCAGGGGCTGTTCAA3067                          AsnProPheValAspLeuGluLysSerAspIleProGlyAlaValGln                               215220225230                                                                   AGTA GAGTTACTACAAATAGAACTAGAGGTTCAAGGTCTAATTCCAAT3115                          SerArgValThrThrAsnArgThrArgGlySerArgSerAsnSerAsn                               235240245                                                                      TTG AATAATCCAACAACAACAACAACTACTACTACCACTACTACAACT3163                          LeuAsnAsnProThrThrThrThrThrThrThrThrThrThrThrThr                               250255260                                                                      ACCGCA CCAATTACTACTAGAAGTAAAAGAAAATCTGACGACTCTGTA3211                          ThrAlaProIleThrThrArgSerLysArgLysSerAspAspSerVal                               265270275                                                                      CAAGAACAAAG CTCACGACAACCAAAAACCTCGAGAAAGTCTGGTTCT3259                          GlnGluGlnSerSerArgGlnProLysThrSerArgLysSerGlySer                               280285290                                                                      CTTAAGGATGTCAGAATTA ACAATATATCAGTAGATTCAAGTTCCAGT3307                          LeuLysAspValArgIleAsnAsnIleSerValAspSerSerSerSer                               295300305310                                                                   GAATCTGATGTGATT ATGTCAGTTTCAAACCGTTTAAAATGTTATCTT3355                          GluSerAspValIleMetSerValSerAsnArgLeuLysCysTyrLeu                               315320325                                                                      TTGGAAGCAGTTGTA AACAAAGGAGAGATCGGTTTAGAAGTCGTCAAA3403                          LeuGluAlaValValAsnLysGlyGluIleGlyLeuGluValValLys                               330335340                                                                      GAAGTTTTAAAAGATTT ACAGGACAAAAATTATGCCACAGGTTTACTT3451                          GluValLeuLysAspLeuGlnAspLysAsnTyrAlaThrGlyLeuLeu                               345350355                                                                      GAAAACATTTTCAATCACAACA AGTCTGAAAGGGTCATAACACTTTCA3499                          GluAsnIlePheAsnHisAsnLysSerGluArgValIleThrLeuSer                               360365370                                                                      AGTAGTTTTTTTGAAATTGCTTCAAAAATT AACTATGATGAAGTTAAG3547                          SerSerPhePheGluIleAlaSerLysIleAsnTyrAspGluValLys                               375380385390                                                                   TTCAGTGAACTCAGTATTGATGTTCTG GAATCGGCAAAGAGATTAACA3595                          PheSerGluLeuSerIleAspValLeuGluSerAlaLysArgLeuThr                               395400405                                                                      TTCGAGAAAAATACAAATATATTAAT TCCAACCAATAATTTTAAAGAA3643                          PheGluLysAsnThrAsnIleLeuIleProThrAsnAsnPheLysGlu                               410415420                                                                      GGTTTTGAATTTTTATGGGTTCCAATTG TTAATGGTATTGCTTCAACT3691                          GlyPheGluPheLeuTrpValProIleValAsnGlyIleAlaSerThr                               425430435                                                                      TCTGTCTTTGTTTCACCAAATAATTATTCAAGT GGTTCATTTGCAAAT3739                          SerValPheValSerProAsnAsnTyrSerSerGlySerPheAlaAsn                               440445450                                                                      GTAGAATCTGCTTTAAAGTTGATTCATCTTTGCATTTCTTTA GGAAAT3787                          ValGluSerAlaLeuLysLeuIleHisLeuCysIleSerLeuGlyAsn                               455460465470                                                                   ATAAATGGTTTCCTCTCTATTAGATCAATTACATTTGA TACATTTAAA3835                          IleAsnGlyPheLeuSerIleArgSerIleThrPheAspThrPheLys                               475480485                                                                      TCGATTACAAAGGATCTTATTCCAATGTCGAAAAGAA TGCTGGACCTT3883                          SerIleThrLysAspLeuIleProMetSerLysArgMetLeuAspLeu                               490495500                                                                      GAACAAGGCTTCCGAAAACTTAGAGATGCTTGGAATAAT AGTAATAAA3931                          GluGlnGlyPheArgLysLeuArgAspAlaTrpAsnAsnSerAsnLys                               505510515                                                                      AAATCCAAAGTTCAAGATAGTGATATTAGTGGCATCGATACAGAG GAT3979                          LysSerLysValGlnAspSerAspIleSerGlyIleAspThrGluAsp                               520525530                                                                      ACAAAGTTGATATCATTTGTCCACGAGTTTATAAATGATAATTTATAT40 27                          ThrLysLeuIleSerPheValHisGluPheIleAsnAspAsnLeuTyr                               535540545550                                                                   TTAAAACTATCAAAAGAAGAAGATGGACTAATGCTAGTAGACTTTCCA 4075                          LeuLysLeuSerLysGluGluAspGlyLeuMetLeuValAspPhePro                               555560565                                                                      ACATCAACACTTTTTATGAGATACAATCCAAATAGCATTGATAACAAA 4123                          ThrSerThrLeuPheMetArgTyrAsnProAsnSerIleAspAsnLys                               570575580                                                                      GTTGGTTTCATGTTCCATTGCCGTTCAGAGATTTCAAAGTTTCAAAGT 4171                          ValGlyPheMetPheHisCysArgSerGluIleSerLysPheGlnSer                               585590595                                                                      TGTAAAAACCACTCGATAGATAACCTTGTTTTATCATTTACTCCAAAT4219                            CysLysAsnHisSerIleAspAsnLeuValLeuSerPheThrProAsn                              600605610                                                                      AACATTAAAAATATATCACAGGATAATGAAAATGAGCTTAAAAAGAAA4267                           AsnIleLys AsnIleSerGlnAspAsnGluAsnGluLeuLysLysLys                              615620625630                                                                   TATTCGTTGATGGTCAGTGATTTTAGAAATGTTCCAAAGGTGACACCA4315                           TyrSe rLeuMetValSerAspPheArgAsnValProLysValThrPro                              635640645                                                                      AAATTTATACCTTCTGAATTTAAAAGGTTTACAATCATTACGTTCACA4363                           LysP heIleProSerGluPheLysArgPheThrIleIleThrPheThr                              650655660                                                                      AACAATTCATACAATGCCAATAGAGTATTTGCGTTTGACGACATCTCA4411                           AsnAsn SerTyrAsnAlaAsnArgValPheAlaPheAspAspIleSer                              665670675                                                                      AGTGGAATTTCAATCACAAATGTTAAAAATATCCACGCAAAGGGTCAA4459                           SerGlyIleSer IleThrAsnValLysAsnIleHisAlaLysGlyGln                              680685690                                                                      CGAAACTTTGAAATCTACGAAACATTACTGGGAAGTACCAGGATTATT4507                           ArgAsnPheGluIleTyrGl uThrLeuLeuGlySerThrArgIleIle                              695700705710                                                                   CGTGCATTTTTCTGCGCTCCATGCTTGATCCAAATCAATAATTTTAAA4555                           ArgAlaPhePheCysA laProCysLeuIleGlnIleAsnAsnPheLys                              715720725                                                                      TTTGCCACAGATAAGTTAATTGATGACCAAAGTGTAAATCACCAGATT4603                           PheAlaThrAspLys LeuIleAspAspGlnSerValAsnHisGlnIle                              730735740                                                                      GCATCTTTGGAAATTAAAAACTTATCATATCTTCCGCTCGACATCAAG4651                           AlaSerLeuGluIleLys AsnLeuSerTyrLeuProLeuAspIleLys                              745750755                                                                      GTTAGAGGTAGTACAGTTGGAACGATTAAGGGTGGAGAGACAGCTCCT4699                           ValArgGlySerThrValGlyTh rIleLysGlyGlyGluThrAlaPro                              760765770                                                                      ATTATTATAAACTCAGAAGAATTTACGTTTTCTATCTCATGCCTTGAT4747                           IleIleIleAsnSerGluGluPheThrPheS erIleSerCysLeuAsp                              775780785790                                                                   ATTAGATTTAGTGCATCCTTAATTTCTAAAACAAAACTAAGCCAACTT4795                           IleArgPheSerAlaSerLeuIleSer LysThrLysLeuSerGlnLeu                              795800805                                                                      CCAACATTTGCTCCAGATGAAAGGTACAATAAAGAGACTAACATTTTA4843                           ProThrPheAlaProAspGluArgTyr AsnLysGluThrAsnIleLeu                              810815820                                                                      AAAGTTTTGGATCAATGTGATGAACTTACTCGAACGTTTTTAAATAAC4891                           LysValLeuAspGlnCysAspGluLeuTh rArgThrPheLeuAsnAsn                              825830835                                                                      TATAAAATAGCTAATAAACTATCAACCATTGAAAATTATTTATATAAT4939                           TyrLysIleAlaAsnLysLeuSerThrIleGluA snTyrLeuTyrAsn                              840845850                                                                      AATTTTATGGGACTAGAAGATGAAGATGAAGATGAAGATGAAGATGAA4987                           AsnPheMetGlyLeuGluAspGluAspGluAspGluAspGlu AspGlu                              855860865870                                                                   GATGAAGATGAAGATGAAGATGAAGATGAAGATGAAGACGAAGATGGG5035                           AspGluAspGluAspGluAspGluAspGluAspGluAsp GluAspGly                              875880885                                                                      TATTGAATTATCATACTTTAAAAATTAATTAAATAAATAAAAAAAAAAAAATG5088                      Tyr                                                                            ATTTCAATTTAAATATATACATATATATATATATAAAATG AGATTAATAAAACTTTTGAG5148              ACCAACATTTAATGAGATTTCTGATGCTGTTTATTTTGCCTGGAATGAGAGCAAAAGGCT5208               AAAAAACATGAGAGAGAATATAATAATAAAGGAAAACTTGGGAAAAAGGATCTAGTATCC5268               ATTTCCATATTAATCCGT GCAGTACTATTAATTAAAAAAATACTTTAAAAAAAATTTTAA5328              AAACATGGAAAATTATATAGATCGATAGATCACTAATTTTTAAAATTAAATATATTAAAT5388               TTATAAAAATTGAAGTTCATCAAGATATATAGATAATTATTTAATTATTTGAATTTTTAA5 448              AAAAAAAAAAAAAAAAAAAAAAAAAATCAAATATGTTTATTGTTTTAAGATTTTTTAATC5508               TCGTCAATGATTTTAAAATAAAAATCGATACATAATTTTAAAAAAAACCCTTTACATTTT5568               TTATTTTAATTCCAAATTTATACATTTTTTATTTTTTTTT TTTTTTTTTTTTTTTTTTTT5628              AATTTAAATTTTTTTTTTTTTTTTTTTTTTATTTATTTAAAATTTAATTATTAATTTTAT5688               AAATAAAAAATAGAAATATAAGTAAAAAAACAAACAACAAATAACATATATAAAAAAATA5748               CAAATAACAAATAATTAA ATAAATTAAATAACCATTAAAAATGTATATTAATAAATTTAA5808              AAGATCTTTATTAGTACTATTGTTACTTTGTAATATTCTTCCTG5852                               (2) INFORMATION FOR SEQ ID NO:3:                                               (i) SEQUENCE CHARACTERISTICS:                                                  (A) LENGTH: 887 amino acids                                                    (B) TYPE: amino acid                                                            (D) TOPOLOGY: linear                                                          (ii) MOLECULE TYPE: protein                                                    (xi) SEQUENCE DESCRIPTION: SEQ ID NO:3:                                        MetAspGluLeuIleSerTrpAspArgPhePheLysPhePheValIle                               151015                                                                         LeuLeuGluGluPheLysGlyCysL ysArgAsnAspValArgLeuSer                              202530                                                                         ValAspTyrAspIleLeuSerGlyIleTyrSerProArgThrPheVal                               3540 45                                                                        LeuLysGluValPheArgAlaValAlaValSerTyrAspGluSerGlu                               505560                                                                         IleAspLeuPheArgLeuGlySerValPheProGlyThrSerLeuTyr                               65 707580                                                                      SerTyrIleProGlyIlePheSerLeuLysAspPheLeuLeuIleSer                               859095                                                                         LysThrLysSerGl yLysIleArgValSerAspValAspGlnAlaIle                              100105110                                                                      LeuIlePheAspHisPheSerArgIleSerAspLysGlnValPheArg                               11512 0125                                                                     LysAspIleIleProGlyTyrArgThrPheGluLysSerIleSerSer                               130135140                                                                      GluTyrLysIleSerAspGlyArgAlaAlaGlyValSerTrpPheA sn                              145150155160                                                                   LeuValSerLysIleSerThrTyrCysLysAsnHisProLeuPheAla                               165170175                                                                      Glu AsnProThrTyrLysHisValAspPheIleSerMetLeuSerLeu                              180185190                                                                      ValHisGlyIleIleValAspSerGlnAsnGluAspGluAsnAsnVal                               195 200205                                                                     SerAlaMetTyrSerLeuAsnProPheValAspLeuGluLysSerAsp                               210215220                                                                      IleProGlyAlaValGlnSerArgValThrThrAs nArgThrArgGly                              225230235240                                                                   SerArgSerAsnSerAsnLeuAsnAsnProThrThrThrThrThrThr                               245250 255                                                                     ThrThrThrThrThrThrThrAlaProIleThrThrArgSerLysArg                               260265270                                                                      LysSerAspAspSerValGlnGluGlnSerSerArgGlnProLysThr                                275280285                                                                     SerArgLysSerGlySerLeuLysAspValArgIleAsnAsnIleSer                               290295300                                                                      ValAspSerSerSerSerGluSer AspValIleMetSerValSerAsn                              305310315320                                                                   ArgLeuLysCysTyrLeuLeuGluAlaValValAsnLysGlyGluIle                               325 330335                                                                     GlyLeuGluValValLysGluValLeuLysAspLeuGlnAspLysAsn                               340345350                                                                      TyrAlaThrGlyLeuLeuGluAsnIlePheAsnHisAs nLysSerGlu                              355360365                                                                      ArgValIleThrLeuSerSerSerPhePheGluIleAlaSerLysIle                               370375380                                                                      AsnTyrAspGlu ValLysPheSerGluLeuSerIleAspValLeuGlu                              385390395400                                                                   SerAlaLysArgLeuThrPheGluLysAsnThrAsnIleLeuIlePro                               405 410415                                                                     ThrAsnAsnPheLysGluGlyPheGluPheLeuTrpValProIleVal                               420425430                                                                      AsnGlyIleAlaSerThrSerValPhe ValSerProAsnAsnTyrSer                              435440445                                                                      SerGlySerPheAlaAsnValGluSerAlaLeuLysLeuIleHisLeu                               450455460                                                                      C ysIleSerLeuGlyAsnIleAsnGlyPheLeuSerIleArgSerIle                              465470475480                                                                   ThrPheAspThrPheLysSerIleThrLysAspLeuIleProMetSer                                485490495                                                                     LysArgMetLeuAspLeuGluGlnGlyPheArgLysLeuArgAspAla                               500505510                                                                      TrpAsnAsnSerAsn LysLysSerLysValGlnAspSerAspIleSer                              515520525                                                                      GlyIleAspThrGluAspThrLysLeuIleSerPheValHisGluPhe                               530535 540                                                                     IleAsnAspAsnLeuTyrLeuLysLeuSerLysGluGluAspGlyLeu                               545550555560                                                                   MetLeuValAspPheProThrSerThrLeuPheMetArgTyr AsnPro                              565570575                                                                      AsnSerIleAspAsnLysValGlyPheMetPheHisCysArgSerGlu                               580585590                                                                      IleS erLysPheGlnSerCysLysAsnHisSerIleAspAsnLeuVal                              595600605                                                                      LeuSerPheThrProAsnAsnIleLysAsnIleSerGlnAspAsnGlu                               610 615620                                                                     AsnGluLeuLysLysLysTyrSerLeuMetValSerAspPheArgAsn                               625630635640                                                                   ValProLysValThrProLysPheIlePro SerGluPheLysArgPhe                              645650655                                                                      ThrIleIleThrPheThrAsnAsnSerTyrAsnAlaAsnArgValPhe                               660665 670                                                                     AlaPheAspAspIleSerSerGlyIleSerIleThrAsnValLysAsn                               675680685                                                                      IleHisAlaLysGlyGlnArgAsnPheGluIleTyrGluThrLeuLeu                               69 0695700                                                                     GlySerThrArgIleIleArgAlaPhePheCysAlaProCysLeuIle                               705710715720                                                                   GlnIleAsnAsnPheLysP heAlaThrAspLysLeuIleAspAspGln                              725730735                                                                      SerValAsnHisGlnIleAlaSerLeuGluIleLysAsnLeuSerTyr                               7407 45750                                                                     LeuProLeuAspIleLysValArgGlySerThrValGlyThrIleLys                               755760765                                                                      GlyGlyGluThrAlaProIleIleIleAsnSerGluGluPhe ThrPhe                              770775780                                                                      SerIleSerCysLeuAspIleArgPheSerAlaSerLeuIleSerLys                               785790795800                                                                   ThrLysLe uSerGlnLeuProThrPheAlaProAspGluArgTyrAsn                              805810815                                                                      LysGluThrAsnIleLeuLysValLeuAspGlnCysAspGluLeuThr                               820 825830                                                                     ArgThrPheLeuAsnAsnTyrLysIleAlaAsnLysLeuSerThrIle                               835840845                                                                      GluAsnTyrLeuTyrAsnAsnPheMetGlyL euGluAspGluAspGlu                              850855860                                                                      AspGluAspGluAspGluAspGluAspGluAspGluAspGluAspGlu                               8658708758 80                                                                  AspGluAspGluAspGlyTyr                                                          885                                                                            (2) INFORMATION FOR SEQ ID NO:4:                                               (i) SEQUENCE CHARACTERISTICS:                                                  (A) LENGTH: 3138 base pairs                                                    (B) TYPE: nucleic acid                                                         (C) STRANDEDNESS: single                                                       (D) TOPOLOGY: circular                                                         (ii) MOLECULE TYPE: DNA (genomic)                                              (iv) ANTI-SENSE: NO                                                            (xi) SEQUENCE DESCRIPTION: SEQ ID NO:4:                                        CGAT AGGTGGCACTTTTCGGGGAAATGTGCGCGGAACCCCTATTTGTTTATTTTTCTAAA60                TACATTCAAATATGTATCCGCTCATGAGACAATAACCCTGATAAATGCTTCAATAATATT120                GAAAAAGGAAGAGTATGAGTATTCAACATTTCCGTGTCGCCCTTATTCCC TTTTTTGCGG180               CATTTTGCCTTCCTGTTTTTGCTCACCCAGAAACGCTGGTGAAAGTAAAAGATGCTGAAG240                ATCAGTTGGGTGCACGAGTGGGTTACATCGAACTGGATCTCAACAGCGGTAAGATCCTTG300                AGAGTTTTCGCCCCGAAGAACGTTTTC CAATGATGAGCACTTTTAAAGTTCTGCTATGTG360               GCGCGGTATTATCCCGTATTGACGCCGGGCAAGAGCAACTCGGTCGCCGCATACACTATT420                CTCAGAATGACTTGGTTGAGTACTCACCAGTCACAGAAAAGCATCTTACGGATGGCATGA480                CAGT AAGAGAATTATGCAGTGCTGCCATAACCATGAGTGATAACACTGCGGCCAACTTAC540               TTCTGACAACGATCGGAGGACCGAAGGAGCTAACCGCTTTTTTGCACAACATGGGGGATC600                ATGTAACTCGCCTTGATCGTTGGGAACCGGAGCTGAATGAAGCCATACCA AACGACGAGC660               GTGACACCACGATGCCTGTAGCAATGCCAACAACGTTGCGCAAACTATTAACTGGCGAAC720                TACTTACTCTAGCTTCCCGGCAACAATTAATAGACTGGATGGAGGCGGATAAAGTTGCAG780                GACCACTTCTGCGCTCGGCCCTTCCGG CTGGCTGGTTTATTGCTGATAAATCTGGAGCCG840               GTGAGCGTGGGTCTCGCGGTATCATTGCAGCACTGGGGCCAGATGGTAAGCCCTCCCGTA900                TCGTAGTTATCTACACGACGGGGAGTCAGGCAACTATGGATGAACGAAATAGACAGATCG960                CTGA GATAGGTGCCTCACTGATTAAGCATTGGTAACTGTCAGACCAAGTTTACTCATATA1020              TACTTTAGATTGATTTAAAACTTCATTTTTAATTTAAAAGGATCTAGGTGAAGATCCTTT1080               TTGATAATCTCATGACCAAAATCCCTTAACGTGAGTTTTCGTTCCACTGA GCGTCAGACC1140              CCGTAGAAAAGATCAAAGGATCTTCTTGAGATCCTTTTTTTCTGCGCGTAATCTGCTGCT1200               TGCAAACAAAAAAACCACCGCTACCAGCGGTGGTTTGTTTGCCGGATCAAGAGCTACCAA1260               CTCTTTTTCCGAAGGTAACTGGCTTCA GCAGAGCGCAGATACCAAATACTGTCCTTCTAG1320              TGTAGCCGTAGTTAGGCCACCACTTCAAGAACTCTGTAGCACCGCCTACATACCTCGCTC1380               TGCTAATCCTGTTACCAGTGGCTGCTGCCAGTGGCGATAAGTCGTGTCTTACCGGGTTGG1440               ACTC AAGACGATAGTTACCGGATAAGGCGCAGCGGTCGGGCTGAACGGGGGGTTCGTGCA1500              CACAGCCCAGCTTGGAGCGAACGACCTACACCGAACTGAGATACCTACAGCGTGAGCTAT1560               GAGAAAGCGCCACGCTTCCCGAAGGGAGAAAGGCGGACAGGTATCCGGTA AGCGGCAGGG1620              TCGGAACAGGAGAGCGCACGAGGGAGCTTCCAGGGGGAAACGCCTGGTATCTTTATAGTC1680               CTGTCGGGTTTCGCCACCTCTGACTTGAGCGTCGATTTTTGTGATGCTCGTCAGGGGGGC1740               GGAGCCTATCGAAAAACGCCAGCAACG CGGCCTTTTTACGGTTCCTGGCCTTTTGCTGGC1800              CTTTTGCTGGCCTTTGGATCTACAAATTAATTAATCCCATCAAATCTTTAAAAAAAAAAA1860               TGGTTTAAAAAAACTTGGGTTGGTTAATTATTATTTGAAAATTTTAAAACCCAAATTAAA1920               AAAA AAAAATGGGATTCAAAAATTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTT1980              TTTTTTTTCAGATTGCATAAAAAGATTTTTTTTTTTTTTTTTTCTTATTTCTTAAAACAA2040               ATAAATTAAATTAAATAAAAAATAAAAATGAAATTCCAACATACATTTAT TGCATTATTA2100              TCACTATTAACATACGCCAATGCATATGAAAGCTTGCATGCCTGCAGGTCGACTCTAGAG2160               GATCCCCGGGTACCTAAATCATGAATGAAAGTGCTTCACATAAAAATAATAATAATAATA2220               TAACAATAATAATATTTAAATGTATAA TAAAATTTAATTACTTTTTTTTTAATGGTTGTT2280              GATCTTTATCCGACCTTAAAAAAAAAAAAATAAAACCAATAGGCTATTGGTTTTTTTTTT2340               AATTGTTTTTTTATTTTTTATTATTACTTTAATTATCATTTTTTAAATTACAAAAAAAAT2400               TAAA AATCCAGATATTAAGGTATTTGCACTAGTGCTTTAACGTTAAAATTTGAAAAAAAA2460              AAAAAATTAATAATTTTACCCTTTATGGGTAAACGATTCTCACATATAATACAATCTCCA2520               TGAAAAGATCCGCTAGACGAGCACAAATATATACTTTTTATTAAAAACGG AGGTCATTTC2580              AATACCTATTGAAGAAATAAATTTTTTTTTTTTTTTTTTTTGTCATGACACTTTTTTTTT2640               TTTGTCATGACAGAATTGAAAAAAACAGAAAGTTATATATTTACCCCCTTTAATTTTTTT2700               TAAAACTTTTGAAACTTTAGTAATAAG ATCGATCTATACTTCAGTACGAACATAAATATG2760              TATAAACCAAAAAAATTGATTAAGATAAAGTTATATGTTTGTATTTAATAAAATAGTTTA2820               GTTTAAAATTTTATATCATTTTTTAAAAAATGAAAATGTTTGAAAAAAAAAATTTTTTTT2880               TTTT TTTTCAACGGGACGATGTAATATCATATATGATTCAAAATTAAAAGTTATTAACAA2940              ATATGTAAAAATTATAAAAAACTAACCTAGTTATAATTACTTTCCCCTCTTTTTTTTTTT3000               TTTTTTTGTCATGACACTTTTTTTTTTTTGTCATGACACTTTTTTTTTAA AAAAAAAAAA3060              AAAAATGTTAAAATACTATTTGATGACATTCATTTTTCCTAGTTTTTTTTTAGATAGATA3120               TAAAAATAAATTGCCTAT3138                                                         (2) INFORMATION FOR SEQ ID NO:5:                                               (i) SEQUENCE CHARACTERISTICS:                                                   (A) LENGTH: 2422 base pairs                                                   (B) TYPE: nucleic acid                                                         (C) STRANDEDNESS: single                                                       (D) TOPOLOGY: circular                                                         (ii) MOLECULE TYPE: DNA (genomic)                                              (iv) ANTI-SENSE: NO                                                            (xi) SEQUENCE DESCRIPTION: SEQ ID NO:5:                                        CGATAGGTGGCACTTTTCGGGGAAATGTGCGCGGAACCCCTATTTGTTTATTTTTCTAAA60                 TACATTCAAATAT GTATCCGCTCATGAGACAATAACCCTGATAAATGCTTCAATAATATT120               GAAAAAGGAAGAGTATGAGTATTCAACATTTCCGTGTCGCCCTTATTCCCTTTTTTGCGG180                CATTTTGCCTTCCTGTTTTTGCTCACCCAGAAACGCTGGTGAAAGTAAAAGATGCTGAA G240               ATCAGTTGGGTGCACGAGTGGGTTACATCGAACTGGATCTCAACAGCGGTAAGATCCTTG300                AGAGTTTTCGCCCCGAAGAACGTTTTCCAATGATGAGCACTTTTAAAGTTCTGCTATGTG360                GCGCGGTATTATCCCGTATTGACGCCGGGCAAGAGC AACTCGGTCGCCGCATACACTATT420               CTCAGAATGACTTGGTTGAGTACTCACCAGTCACAGAAAAGCATCTTACGGATGGCATGA480                CAGTAAGAGAATTATGCAGTGCTGCCATAACCATGAGTGATAACACTGCGGCCAACTTAC540                TTCTGACAACGAT CGGAGGACCGAAGGAGCTAACCGCTTTTTTGCACAACATGGGGGATC600               ATGTAACTCGCCTTGATCGTTGGGAACCGGAGCTGAATGAAGCCATACCAAACGACGAGC660                GTGACACCACGATGCCTGTAGCAATGCCAACAACGTTGCGCAAACTATTAACTGGCGAA C720               TACTTACTCTAGCTTCCCGGCAACAATTAATAGACTGGATGGAGGCGGATAAAGTTGCAG780                GACCACTTCTGCGCTCGGCCCTTCCGGCTGGCTGGTTTATTGCTGATAAATCTGGAGCCG840                GTGAGCGTGGGTCTCGCGGTATCATTGCAGCACTGG GGCCAGATGGTAAGCCCTCCCGTA900               TCGTAGTTATCTACACGACGGGGAGTCAGGCAACTATGGATGAACGAAATAGACAGATCG960                CTGAGATAGGTGCCTCACTGATTAAGCATTGGTAACTGTCAGACCAAGTTTACTCATATA1020               TACTTTAGATTGA TTTAAAACTTCATTTTTAATTTAAAAGGATCTAGGTGAAGATCCTTT1080              TTGATAATCTCATGACCAAAATCCCTTAACGTGAGTTTTCGTTCCACTGAGCGTCAGACC1140               CCGTAGAAAAGATCAAAGGATCTTCTTGAGATCCTTTTTTTCTGCGCGTAATCTGCTGC T1200              TGCAAACAAAAAAACCACCGCTACCAGCGGTGGTTTGTTTGCCGGATCAAGAGCTACCAA1260               CTCTTTTTCCGAAGGTAACTGGCTTCAGCAGAGCGCAGATACCAAATACTGTCCTTCTAG1320               TGTAGCCGTAGTTAGGCCACCACTTCAAGAACTCTG TAGCACCGCCTACATACCTCGCTC1380              TGCTAATCCTGTTACCAGTGGCTGCTGCCAGTGGCGATAAGTCGTGTCTTACCGGGTTGG1440               ACTCAAGACGATAGTTACCGGATAAGGCGCAGCGGTCGGGCTGAACGGGGGGTTCGTGCA1500               CACAGCCCAGCTT GGAGCGAACGACCTACACCGAACTGAGATACCTACAGCGTGAGCTAT1560              GAGAAAGCGCCACGCTTCCCGAAGGGAGAAAGGCGGACAGGTATCCGGTAAGCGGCAGGG1620               TCGGAACAGGAGAGCGCACGAGGGAGCTTCCAGGGGGAAACGCCTGGTATCTTTATAGT C1680              CTGTCGGGTTTCGCCACCTCTGACTTGAGCGTCGATTTTTGTGATGCTCGTCAGGGGGGC1740               GGAGCCTATCGAAAAACGCCAGCAACGCGGCCTTTTTACGGTTCCTGGCCTTTTGCTGGC1800               CTTTTGCTGGCCTTTGGATCCGCTAGACGAGCACAA ATATATACTTTTTATTAAAAACGG1860              AGGTCATTTCAATACCTATTGAAGAAATAAATTTTTTTTTTTTTTTTTTTTGTCATGACA1920               CTTTTTTTTTTTTGTCATGACAGAATTGAAAAAAACAGAAAGTTATATATTTACCCCCTT1980               TAATTTTTTTTAA AACTTTTGAAACTTTAGTAATAAGATCTATACTTCAGTACGAACATA2040              AATATGTATAAACCAAAAAAATTGATTAAGATAAAGTTATATGTTTGTATTTAATAAAAT2100               AGTTTAGTTTAAAATTTTATATCATTTTTTAAAAAATGAAAATGTTTGAAAAAAAAAAT T2160              TTTTTTTTTTTTTTCAACGGGACGATGTAATATCATATGATTCAAAATTAAAAGTTATTA2220               ACAAATATGTAAAAATTATAAAAAACTAACCTAGTTATAATTACTTTCCCCTCTTTTTTT2280               TTTTTTTTTTTGTCATGACACTTTTTTTTTTTTGTC ATGACACTTTTTTTTTAAAAAAAA2340              AAAAAAAAATGTTAAAATACTATTTGATGACATTCATTTTTCCTAGTTTTTTTTTAGATA2400               GATATAAAAATAAATTGCCTAT2422                                                     (2) INFORMATION FOR SEQ ID NO:6:                                                (i) SEQUENCE CHARACTERISTICS:                                                 (A) LENGTH: 37 base pairs                                                      (B) TYPE: nucleic acid                                                         (C) STRANDEDNESS: single                                                       (D) TOPOLOGY: linear                                                           (ii) MOLECULE TYPE: DNA (genomic)                                              (iv) ANTI-SENSE: NO                                                            (xi) SEQUENCE DESCRIPTION: SEQ ID NO:6:                                        TTTTTTGTCATGACACTTTTTTTTTTTTGTCATGACA37                                        (2) INFORMATION FOR SEQ ID NO:7:                                               (i) SEQUENCE CHARACTERISTICS:                                                  (A) LENGTH: 38 base pairs                                                      (B) TYPE: nucleic acid                                                         (C) STRANDEDNESS: single                                                       (D) TOPOLOGY: linear                                                           (ii) MOLECULE TYPE: DNA (genomic)                                              (iv) ANTI-SENSE: NO                                                            (xi) SEQUENCE DESCRIPTION: SEQ ID NO:7:                                        GGAGGGATCCAAAGGCCAGCAAAAGGCCAGCAAAAGGC 38                                      (2) INFORMATION FOR SEQ ID NO:8:                                               (i) SEQUENCE CHARACTERISTICS:                                                  (A) LENGTH: 42 base pairs                                                      (B) TYPE: nucleic acid                                                         (C) STRANDEDNESS: single                                                       (D) TOPOLOGY: linear                                                           (ii) MOLECULE TYPE: DNA (genomic)                                              (iv) ANTI-SENSE: NO                                                            (xi) SEQUENCE DESCRIPTION: SEQ ID NO:8:                                        GGGGTGGATCCGCTAGCCGCATCGATAGGTGGCACTTTTC GG42                                  (2) INFORMATION FOR SEQ ID NO:9:                                               (i) SEQUENCE CHARACTERISTICS:                                                  (A) LENGTH: 16 base pairs                                                      (B) TYPE: nucleic acid                                                         (C) STRANDEDNESS: single                                                       (D) TOPOLOGY: linear                                                           (ii) MOLECULE TYPE: DNA (genomic)                                              (iv) ANTI-SENSE: NO                                                            (xi) SEQUENCE DESCRIPTION: SEQ ID NO:9:                                        GAAGCATTTATCAGGG 16                                                            (2) INFORMATION FOR SEQ ID NO:10:                                              (i) SEQUENCE CHARACTERISTICS:                                                  (A) LENGTH: 34 base pairs                                                      (B) TYPE: nucleic acid                                                         (C) STRANDEDNESS: single                                                       (D) TOPOLOGY: linear                                                           (ii) MOLECULE TYPE: DNA (genomic)                                              (iv) ANTI-SENSE: NO                                                            (xi) SEQUENCE DESCRIPTION: SEQ ID NO:10:                                       TGGCCAAGCTTAG ATCTACAAATTAATTAATCCC34                                          (2) INFORMATION FOR SEQ ID NO:11:                                              (i) SEQUENCE CHARACTERISTICS:                                                  (A) LENGTH: 33 base pairs                                                      (B) TYPE: nucleic acid                                                         (C) STRANDEDNESS: single                                                       (D) TOPOLOGY: linear                                                           (ii) MOLECULE TYPE: DNA (genomic)                                              (iv) ANTI-SENSE: NO                                                            (xi) SEQUENCE DESCRIPTION: SEQ ID NO:11:                                        CCCGGGATGTTCACCATGCATTTTTATTTTTTA33                                           (2) INFORMATION FOR SEQ ID NO:12:                                              (i) SEQUENCE CHARACTERISTICS:                                                  (A) LENGTH: 39 base pairs                                                      (B) TYPE: nucleic acid                                                         (C) STRANDEDNESS: single                                                       (D) TOPOLOGY: linear                                                           (ii) MOLECULE TYPE: DNA (genomic)                                              (iv) ANTI-SENSE: NO                                                             (xi) SEQUENCE DESCRIPTION: SEQ ID NO:12:                                      GGGAAGCTTGGATGAATTCAAAAAATGAAATTCCAACAT39                                      (2) INFORMATION FOR SEQ ID NO:13:                                              (i) SEQUENCE CHARACTERISTICS:                                                  (A) LENGTH: 36 base pairs                                                      (B) TYPE: nucleic acid                                                         (C) STRANDEDNESS: single                                                       (D) TOPOLOGY: linear                                                           (ii) MOLECULE TYPE: DNA (genomic)                                              (iv) ANTI-SENSE: NO                                                            (xi) SEQUENCE DESCRIPTION: SEQ ID NO:13:                                       CCCGGGTCGACCTGCTATTGCATTTGCATATGTTAA36                                         (2) INFORMATION FOR SEQ ID NO:14:                                              (i) SEQUENCE CHARACTERISTICS:                                                  (A) LENGTH: 22 base pairs                                                      (B) TYPE: nucleic acid                                                         (C) STRANDEDNESS: single                                                       (D) TOPOLOGY: linear                                                           (ii) MOLECULE TYPE: DNA (genomic)                                              (iv) ANTI-SENSE: NO                                                            (xi) SEQUENCE DESCRIPTION: SEQ ID NO:14:                                       CACGCCAATGCATATGAAAGCT22                                                       (2) INFORMATION FOR SEQ ID NO:15:                                              (i) SEQUENCE CHARACTERISTICS:                                                  (A) LENGTH: 22 base pairs                                                      (B) TYPE: nucleic acid                                                         (C) STRANDEDNESS: single                                                        (D) TOPOLOGY: linear                                                          (ii) MOLECULE TYPE: DNA (genomic)                                              (iv) ANTI-SENSE: NO                                                            (xi) SEQUENCE DESCRIPTION: SEQ ID NO:15:                                       TAAGCTTTCATATGCATTGGCG22                                                       (2) INFORMATION FOR SEQ ID NO:16:                                              (i) SEQUENCE CHARACTERISTICS:                                                  (A) LENGTH: 31 base pairs                                                      (B) TYPE: nucleic acid                                                          (C) STRANDEDNESS: single                                                      (D) TOPOLOGY: linear                                                           (ii) MOLECULE TYPE: DNA (genomic)                                              (iv) ANTI-SENSE: NO                                                            (xi) SEQUENCE DESCRIPTION: SEQ ID NO:16:                                       TGCCGGTACCTAAATCATGAATGAAAGTGCT31                                              (2) INFORMATION FOR SEQ ID NO:17:                                              (i) SEQUENCE CHARACTERISTICS:                                                  (A) LENGTH: 34 base pairs                                                       (B) TYPE: nucleic acid                                                        (C) STRANDEDNESS: single                                                       (D) TOPOLOGY: linear                                                           (ii) MOLECULE TYPE: DNA (genomic)                                              (iv) ANTI-SENSE: NO                                                            (xi) SEQUENCE DESCRIPTION: SEQ ID NO:17:                                       CCCGGGAATTCAGATCTTTTCATGGAGATTGTAT34                                           (2) INFORMATION FOR SEQ ID NO:18:                                              (i) SEQUENCE CHARACTERISTICS:                                                   (A) LENGTH: 10 base pairs                                                     (B) TYPE: nucleic acid                                                         (C) STRANDEDNESS: single                                                       (D) TOPOLOGY: linear                                                           (ii) MOLECULE TYPE: DNA (genomic)                                              (iv) ANTI-SENSE: NO                                                            (xi) SEQUENCE DESCRIPTION: SEQ ID NO:18:                                       AATTCCCGGG10                                                                   (2) INFORMATION FOR SEQ ID NO:19:                                              (i ) SEQUENCE CHARACTERISTICS:                                                 (A) LENGTH: 66 base pairs                                                      (B) TYPE: nucleic acid                                                         (C) STRANDEDNESS: single                                                       (D) TOPOLOGY: linear                                                           (ii) MOLECULE TYPE: DNA (genomic)                                              (iv) ANTI-SENSE: NO                                                            (xi) SEQUENCE DESCRIPTION: SEQ ID NO:19:                                       GATGAAGATGAAGATGAAGATGAAGATGAAGATGAAGATGAAGATGAAGATGAAGATGAA60                 GATG AA66                                                                  

We claim:
 1. A method of producing a desired polypeptide comprising the following steps:a) preparing a recombinant plasmid vector comprising an origin of replication from plasmid Ddp2 or pDG1 and a DNA sequence encoding the desired polypeptide, wherein plasmid Ddp2 or pDG 1 and a DNA sequence encoding the desired polypeptide, wherein said plasmid vector lacks DNA sequences encoding a Rep protein required for extrachromosomal replication in wild type Dictyostelium; b) preparing a recombinant strain of Dictyostelium comprising a gene encoding a Rep polypeptide which allows replication of the recombinant plasmid of step (a); c) transforming the recombinant Dictyostelium of step (b) with the recombinant plasmid vector of step (a); d) culturing the transformed Dictyostelium of step (c) under conditions which allow expression of the DNA sequence encoding the desired polypeptide; and e) recovering the desired polypeptide.
 2. A method as claimed in claim 1 in which the desired polypeptide is produced in a cell bound form.
 3. A method as claimed in claim 1 in which the gene encoding the Rep-polypeptide is present on the chromosome of the recombinant strain.
 4. A method of producing a desired polypeptide comprising the following steps:a) preparing a recombinant plasmid vector comprising an origin of replication from plasmid Ddp2 or pDG1 and a DNA sequence encoding the desired polypeptide, wherein said plasmid vector lacks DNA sequences encoding a Rep protein required for extrachromosomal replication in wild type Dictyostelium; b) preparing a recombinant plasmid vector comprising a gene encoding a Rep polypeptide which allows replication of the recombinant plasmid of step (a); c) preparing a recombinant strain of Dictyostelium by transformation of Dictyostelium with the recombinant plasmid vectors of steps (a) and (b); d) culturing the transformed Dictyostelium of step (c) under conditions which allow expression of the DNA sequence encoding the desired polypeptide; and e) recovering the desired polypeptide.
 5. A recombinant strain of Dictyostelium which harbours a recombinant plasmid, said recombinant plasmid comprising an origin of replication from plasmid Ddp2 or plasmid pDG1, said plasmid lacking a functional Rep gene required for extrachromosomal replication in wild type Dictyostelium, said recombinant strain comprising a chromosomally located gene encoding a Rep polypeptide which allows replication of said recombinant plasmid.
 6. A recombinant strain of Dictyostelium as claimed in claim 5 in which the Rep polypeptide has an amino acid sequence as shown in FIG. 2 (SEQ ID NO: 3).
 7. A recombinant strain of Dictyostelium as claimed in claim 5 in which the gene encoding the Rep polypeptide has a DNA sequence as shown in FIG. 1 (SEQ ID NO: 2) from nucleotide 2378 to nucleotide
 5038. 8. A recombinant strain of Dictyostelium as claimed in claim 5 in which the gene encoding the Rep polypeptide has a DNA sequence as shown in FIG. 1 (SEQ ID NO: 2) from nucleotide 1885 to nucleotide
 5292. 9. A recombinant plasmid vector comprising an origin of replication from plasmid Ddp2 or plasmid pDG1 and at least one heterologous DNA sequence, said heterologous DNA sequence encoding a desired polypeptide and at least one promoter sequence that controls expression of said desired polypeptide, wherein said plasmid vector lacks DNA sequences encoding a Rep protein required for extrachromosomal replication in wild type Dictyostelium.
 10. A recombinant plasmid as claimed in claim 9 in which the heterologous DNA sequence includes a DNA sequence encoding a polypeptide signal for secretion of the desired polypeptide.
 11. A recombinant plasmid vector as claimed in claim 9 in which the recombinant plasmid vector includes an expression cassette comprising a promoter DNA sequence from Dictyostelium Actin 15 gene, a DNA sequence encoding the secretion signal peptide sequence of the D19 gene of the protein PsA and a DNA signal sequence for RNA polyadenylation from the Actin 15 gene.
 12. Recombinant plasmid vector pMUW102.
 13. Recombinant plasmid vector pMUW111.
 14. Recombinant plasmid vector pMUW110.
 15. Recombinant plasmid vector pMUW130.
 16. Recombinant plasmid vector pMUW1530.
 17. Recombinant plasmid vector pMUW1570.
 18. Recombinant plasmid vector pMUW1580.
 19. Recombinant plasmid vector pMUW1630.
 20. Recombinant plasmid vector pMUW1633. 